THE . NALS | 


_ MATHEMATICA 
- STATISTICS: 


FOUNDED AND EDITED BY H.C. CARVER, 1930-1988 
EDITED BY a, &. WILKS, 1988-1949 


‘ THe OFrrIictaL JOURNAL OF THE INSTITUTE * 
OF M tHE ATG: Sravistics 


ez 


Contents 
Some Seana Monte Carlo Methods for the. Dirichlet Problem Mervin. He 
4». MELLER 


Conitritvutions to the ‘theory of Rank Order Statiniings ‘Rwo Sample Cate I, Riouag 
VAGE PD Re BG 
The Admisaibility of oelting’ a Th Peat: ‘Crattins Starx. 
thod of Constructing F Regan, Peleced In 
ae ty fear #4 


Mv inimax ndrketer of he Sample Distributiea Function and of 
Multinomial Estiraator. A. Dyorertxy, J. KtEFER, AND J, Wonrows 
A Cautpadicon of Téits or the Mean of a L thmico-Normal Senate vith 
_ Rnown Vatiance. Noamay C. SB¥ERC AND ‘owe G, OLps thins. 
‘Saniple Proceduies in Simultaneous Estiqation, W. C. Hxavy, ‘ae see ieriigs: 
Distrinat — the _— ie Rextion os ne re Populi 


e “753 
Bice Paris ance Componente: E, Balanced Designs. Joan W., Rage! i 722 
ee ae in C: ooponedite of Variance and Cov | Abalysis. 8. R. Seana — 736 
i ee aeecrosia of Mo ‘Interavtion” itt a Multi-W: Contingency Table. 8. N° 
. (Manyin A. ‘Kasten “ss 

The ‘as’ } Certain Extimates of the Parameters of the Papal Value Distribution. 


» Kimpanc. |. es Cae Te 9g ce 3 
stem with x* Service-time Distribution. Davin M,. QOWisnare. 


Approximations to the ¢, 7, and ane eee eee. a 


fochastie Inde; pendence ‘of Two Sevond- ~dogree ol ynd 
_oamally Distributed Variates. R. (i. Lama. ieee 
he WAGR Sequential Test Reaches, Decision “with Pr 
. wann Wituam A. Kevan. : 


s¢-of Generalized Probability Paper tor Con in 
— 


We wen nore AND GiRaip a pt BORMAN. 


Abe Tukey Test for the Equality of niall na ‘tie ie 
“.. of Variances. K. ¥: Rice eo % 
-Bemarke on Sone Nonparametric pais of a « Density. 


; iinet Nu 
ate on Bal. foe ole Bec 
Efficiency Factor of an teaepiet oe 
The Be Distribution of the ereneé 
A ZRRE ASD A. ZINGER. 4h). 
of Schitions toa 








SOME CONTINUOUS MONTE CARLO METHODS FOR THE DIRICHLET 
PROBLEM! 
By Mervin E. Muir? 
University of California, Los Angeles, and Cornell University 

0. Summary. Monte Carlo techniques are introduced, using stochastic models 
which are Markov processes. This material includes the N-dimensional Spherical, 
General Spherical, and General Dirichlet Domain processes. These processes are 
proved to converge with probability 1, and thus to yield direct statistical esti- 
mates of the solution to the N-dimensional Dirichlet problem. The results are 
obtained without requiring any further restrictions on the boundary or the func- 
tion defined on the boundary, in addition to those required for the existence and 
uniqueness cf the solution to the Dirichlet problem. A detailed study is made for 
the N-dimensional Spherical process; this includes a study of the order of the 
average number of steps required for convergence. Asymptotic confidence in- 
tervals are obtained. When computing effort is measured in terms of the order 
of the average number of steps required for convergence, the often-made con- 
jecture that the computing effort of a Monte Carlo procedure should be a linear 
function of the dimensionality of the problem is shown to be true for the cases 
considered. Comments are included regarding the application of these processes 
on digital computers, and truncation methods are suggested. 


1. Introduction. Throughout this paper, D will denote a bounded finitely 
connected N-dimensional domain in a Euclidean space. Further, I'D] will de- 
note the boundary points of D. A point in space will be denoted by x, where x 
has coordinates (xz, , 2, -** , tw). 

The N-dimensional Dirichlet problem. Given the domain D and a continuous 
function f(x) defined on the boundary I'[D], the N-dimensional Dirichlet problem 
consists in finding a function u(x) continuous in D + ID], reducing to f(z) 
on I'[D], and having in D continuous partial derivatives of second order which 
satisfy Laplace’s differential equation, i.e., in finding u(x) such that 


N. Pu(x) 
ie = 0, z é D, 


( ° 3 = 9 
1.1) Au(z) 2 az3 


(1.2) u(x) = f(z), xe T[D). 

Received May 6, 1955. 

1 Sections of this paper are part of a Doctoral Dissertation written at the University of 
California, Los Angeles. The research was sponsored, in part, by the National Bureau of 
Standards while the author was a Fellow in Mathematics at the Institute for Numerical 
Analysis, and the preparation of the paper was sponsored, in part, by the Office of Naval 
Research. 

2 Now with the International Business Machines Corp., New York, N. Y. 


569 





570 MERVIN E. MULLER 


Some writers have generalized the Dirichlet problem to allow the following: 
In place of (1.1), the general linear elliptic operator 


“2 N 
L(u) = j(x) Je . + > b;(zx) 
J OX; OX; Junk 
is introduced, where a;;(x), b;(2) possess continuous second-order derivatives 
in D; while in place of (1.2), they have u(x) = g(x), x e T[D], where g(x) is 
permitted to have points of discontinuity on I'D]. These generalizations of the 
Dirichiet problem will not be considered in this paper, even though some of the 
material developed is applicable to the more general problem. 

In essence, the Monte Carlo method is experimental. The tempting title 
“Monte Carlo” is being used here as it has been by others, e.g., by Metropolis 
and Ulam [17], in order to convey that an unknown solution to a given physical 
problem is being estimated by a method which essentially depends on a statis- 
tical sampling technique. This approach requires the utilization of random varia- 
bles of an appropriate stochastic process such that samples of the process yield 
valid statistical estimates of the desired unkown quantities. 

The previous Monte Carlo studies on the Dirichlet problem (for a review of the 
literature see Curtiss [2]) have given estimators for the solution to discrete re- 
placements of the Dirichlet problem, since they initially replaced the given do- 
main D by a network of points and replaced the differential operator by a dif- 
ference operator. Further, the previous studies needed to impose assumptions in 
addition to those made here concerning the regularity of the boundary of D. It 
will be seen that the processes considered here yield direct statistical estimates 
of the solution to the Dirichlet problem. Consequently, the only inherent possible 
source of error with these processes results from the statistical fluctuation of the 
estimators. Naturally, if these methods are utilized on a digital computer there 
will be an additional ‘‘round-off” error due to the replacement of a continuous 
variable by a variable which possesses only a discrete number of digits. 

Since the one-dimensional Dirichlet problem reduces to finding a straight line 
through two points, it will be completely omitted from subsequent consideration 
in this paper. 

We next consider the question of the nature of PD}. It has already been demon- 
strated that the theory of probability, as such, can be used as a rigorous mathe- 
matical tool in the study of differential and integral equations (see, for example, 
Feller [7] and Kac [9]). However, the contribution that Monte Carlo studies 
might make to boundary value problems, when the analytical questions of exist- 
ence and uniqueness of a solution have not been settled, is a moot question. Con- 
sequently, we make the following assumption: 

Fundamental Assumption. Throughout this paper, the boundary T[D] will 
be assumed to be of sufficient regularity to ensure that the Dirichlet problem has 
a solution and that it is unique. Kellogg [12] and [13] has a detailed discussion 
and bibliography concerning regularity conditions on T'[D]; see also de La Vallée 
Poussin [22]. 





MONTE CARLO METHODS 


2. Definition of the Spherical process. First consider a preliminary definition. 

DEFINITION 2.1. Maximum N-sphere: K(x). Given a domain D with boundary 
I'[D], and any point x belonging to D + T[D], then K(x) is the maximum N- 
dimensional sphere with center x and radius r if r = inf, -ryp)||2’ — al!; K(x) 
denotes the surface of K(x), where K(x) is empty if x ¢ T'|D). 

DEFINITION 2.2. The Spherical process. Given a domain D of N dimensions 
with boundary T'[D], and any point x belonging to D + TD), then the N-dimen- 
sional Spherical process originating from 2 is ®(a), where 

(A) (2) = {S(z,¢%),0 S  S 1}; 1.-€., (x) is the totality of all sequences of 
points S(x,@), 0 < @ = 1, where 

(B) Each value of ¢@ specifies a sequence of points S(a,@) = {Pj4:(a, ), 

0,1, ---} generated according to the following stipulations: 
1) About the point Po(x, ¢) = x, determine the maximum N-sphere K(P»); 
(2) Select the point P(x, @) uniformly at random on K(P»); 
(3) The point P;.:(7, @) is determined recursively from P;(x, @) and K(P, 
in the same manner as P;(2, ¢) was determined from Po(z, ¢). 

In introducing the N-dimensional Spherical process, we have actually set up 
a probability space whose underlying points are sequences of directions picked 
uniformly at random and picked so as to be mutually independent. Probabilities 
are defined as follows. Corresponding to the jth direction, i.e., the direction from 
the point P;_,(zx, @) to the point P(x, @), there is a point Q; on the surface F 
of the unit N-sphere F. If F; is any Lebesgue measurable subset of F of measure 
m(F;), then picking the jth direction uniformly at random is equivalent to setting 
Pr {Q; ¢ Fj! = m(F;)/m(F). Picking the directions to be mutually independent 
is equivalent to setting 


Pr {Q:¢Fi,Q:¢F,--: ,Q,¢F.} = [] Pr {Q; ¢F;} 
j=l 


for each s. It then follows, by the Extension Theorem due to Kolmogorov [14] 
(see page 29), that the probability distribution on the space of infinite sequences 
will be properly defined. Further, it will be seen that the Spherical process is a 
Markov process with a discrete parameter. 

We next consider the Einstein-von Smoluchowski stochastic model of Brownian 
motion, since this model is of help in showing that the Spherical process furnishes 
a valid direct statistical estimate of the solution to the Dirichlet problem. 


3. The Brownian motion process. Let (2, &, Pr) be a probability space 
i.e., 2 = {w} is a set of elements w, & = {EF} is a Borel field of subsets E of Q, 
and Pr (£) is a countably additive measure defined on & with normalization 
Pr (Q) = 1. 

Throughout this paper X(t, w) will denote the well-known N-dimensiona! 
Brownian motion process starting from 2, i.e., 


X(t, w) = {(ar'(t, w), a(t, w), --- , 2X (t,o) |O St < ~,weQ} 





572 MERVIN E. MULLER 


is the Cartesian product of N mutually independent one-dimensional Brownian 
motion processes satisfying the condition that 


X(0, w) = « = (2'(0, w), 2°(0, w), --- , 2*(0, w)). 


For a detailed definition of this process, see, for example, Doob [4], page 97, or 
Dvoretzky, Erdés, and Kakutani [6]. 

It is assumed that the basic probability measure is completed in such a way that 
Theorem 3.1, to be introduced, is valid (see Doob [3] or [4]). 

A sample function or path of the process X(t, w) is a function of t, defined for 
0 S i < @ and obtained by fixing w. When speaking of “almost all sample func- 
tions,”’ this is to be understood to mean ‘‘almost all w.” 

We have defined the particular Brownian motion process X (0, w) = z, where x 
is a point in N space, so that in the sequel we can speak of a sample function 
originating from a point x which is of interest. We shall now consider some of the 
known and, in particular, new properties of the Brownian motion process that 
will be pertinent. It will be seen that all subsequent material of this section will 
rest on the following important theorem due to Wiener. A recent version is found 
in Paley and Wiener [19]. A proof may also be found in Doob [3] or [4] or in Lévy 
[15] or [16]. 

THEOREM [Wiener] 3.1. Almost all sample functions of the Brownian motion 
process are continuous; i.e., the subset Qs of 2, consisting of all w for which X(t, w) 
is a continuous function of t forO S t < «, is &~measurable and Pr (Q%) = 1. 

The first-passage time of the process is defined in the following manner. Given 
any point xz belonging to a domain D with boundary I'[D] and any w belonging to 
©, consider the sample function of the process X (/, w) originating from z. If there 
exists a positive number +r = 7(z, '[D],w) such that X(r,w) ¢ T[D] and 
X(t, w) g T[D] for any ¢t with 0 S ¢t < 7+, then r(z, I'[D], w) is called the first- 
passage time to the boundary I[D] for the sample function originating at the 
point zx. 

If r(x, T'[D], w) exists, then P(x, I'D], w) denotes the point at which the sam- 
ple function X(t, w), originating from x, intersects T'[D] for the first time after 
t = 0. P(x, T[D], w) is called the point of first intersection. 

Following Kakutani [10] and [11], let Q(z, T'[D]) denote the set of w ¢ % such 
that 7(x, TD], w) exists. His results imply that Q(2, T'[D]) is a measurable subset 
of 2 and that r(x, T'[D], w) is a real-valued measurable function of w on 


Q(x, P[D)). 


His material yields the following theorem, which in the form presented here can 
also be found in earlier studies by Bachelier, Lévy, and Wiener. 

TuroreM [Kakutani] 3.2. Given any point x belonging to a domain D with 
boundary T(D), then almost all sample functions of the Brownian motion process on 
D originating at x intersect the boundary TD), i.e., Pr {Q(x, T{D})} = 1. 

The next two theorems are due to Kakutani [11]. The only conditions to be im- 
posed on the domain D and its boundary I'[D] are that they be regular for the 
Direchlet problem (see the Introduction). A subset of the boundary I|D] 





MONTE CARLO METHODS 


will be called an elementary set if it consists of a finite number of mutually dis- 
joint nonabutting simple surfaces on the boundary I[D], including or excluding 
their closures, where a simple surface is a homeomorphism of the surface of an 
N-sphere. 

THEoREM [Kakutani] 3.3. Given a domain D and its boundary T{D), and E an 
elementary set on the boundary T[D), then the probability, Pr (x, E, D), that the 
Brownian motion process originating from a point x belonging to D will intersect 
the set E for some t > 0, without intersecting T[D| — E before it, is a harmonic 
function of x in D, and lim,,p,2-:, Pr (x, E, D) = 1 or 0 according as x» is an 
inner point of E or of T(D] — E. 

TuHeoremM [Kakutani] 3.4. With the same conditions as given in Theorem 3.3, 
let f(x) be a real-valued continuous function defined on the boundary T{D]. Then, 
for any point x» belonging to D, the value u(x) of the solution u(x) of the Dirichlet 
problem for the domain D and the boundary value function f(x) is obtained by 
taking the integral of a Poisson type of f(x) with respect to the kernel Pr (xo, E, D) on 
I'D), or by taking the mathematical expectation of the composed function 


f(P(ao, TID], w)): 


u(r) = [ Pr (x9, dx, D)f(x) = [10 , T'[D], w)) dw. 
/T(D) 2 


The details of the proofs of Kakutani’s theorems, and certain generalizations, 
may be found in the recent paper by Doob [5]. 

The following definition of successive first intersections of the Brownian motion 
process will be useful. 

DeFINITION 3.1. Successive first intersections. Given any point x belonging to 
a domain D with boundary I[D] and any w belonging to Q, consider the sample 
function of the Brownian motion process X(t, w) originating from z. If this sample 
function has a first intersection on the surface K(P») = x), denote this point as 
P(x, K(Po), w). Successive points Pj4:(7, K(P;), w), i = 1,2, +--+, of first 
intersection on successive surfaces K(P;), if these points exist, will be defined 
recursively as was done for the Spherical process (Definition 2.2). A sequence of 
successive first intersections associated with a particular sample function exists if 
Pi41(z, K(P,), w) exists foreachi,7 = 0, 1, ---. Let T(z, w) denote this sequence, 
i.e., T(z, w) = {Piyi(z, K(P;), w), « = 0,1, 2, ---}. 

Before making use of Definition 3.1, the following remarks are in order. (1) 
If x is the center of any N-dimensional sphere, say S(x), then the probability 
distribution of points of first intersection, P(x, S(x), w), on the surface S(z), 
for the sample functions of the Brownian motion process originating from 2, is 
uniformly distributed on the surface S(x) for all w e Q(z, S(x)). (2) If the 
Brownian motion process has a first intersection, say for t = r(w) = 7(z, F, w), 
with a closed boundary F (of a specified type), with probability 1, then the process 
X(s, +- r(w), w) in the new parameter variable s is again a Brownian motion 
process. Moreover, the difference, X(s + 7(w), w) — X(r(w), w), process is quite 
independent of the original process, X(t, w), forO < t S r(w). 





574 MERVIN E. MULLER 


As intuitively obvious as these statements are, their proofs require tneasure 
theoretic considerations involving function spaces. Since the literature does not 
include detailed proofs of the above remarks, the author wishes to express his 
appreciation to Professor G. Hunt for demonstrating their proofs in a private 
communication. 

By Theorems 3.1 and 3.2, the totality of w’s having continuous sample fune 
tions X(t, w) originating from x and intersecting T'[D] has measure 1; i.e 


Pr {Q(x, T{D))} 


Consequently, the proof of the following theorem can be obtained by restricting 
w € Q(x, T[D]}) and by completing an induction argument which makes use of the 
facts mentioned in remark (2). 

THEOREM 3.5. Given any point x belonging lo a domain D with boundary YD), 
a sequence T(x, w) of successive first intersections exists for almost all sample func- 
tions of the Brownian motion process originating from x. 

THEOREM 3.6. Given any point x belonging to a domain D with boundary Y|D), 
then with probability 1, the sequence T(x, w) = {Pjii(z, K(P,), w), 7 oi1.2 ' 


of successive first intersections corresponding to the sample function X(t, w) of the 


Brownian motion process originating from x converges to a point of the boundary 
I'D]. This point of convergence coincides with the point where the sample function 
intersects the boundary. 

Proor. We restrict w to that subset of 2 for which the sample functions originat- 
ing from the given point x are continuous, intersect the boundary T'[D] for finite 
values of ¢, and have sequences T(x, w). From Theorem 3.5, this subset of 2 
is Q(x, T{[D]) and it has measure 1. Hence, to prove the theorem, we need only to 
prove it to be true for all w ¢ Q(z, T[D]). So we now select any w ¢ Q(z, T{D)) 
and let f be the finite value of ¢ for which the sample function first intersected the 
boundary T[D], say at the point z. Using three major steps, we will now prove 
that the corresponding sequence of points T(z, w) = {Pi.(2, K(P,), w), i = 
0, 1, ---} converges to the point z. 

1. The sequence T(x, w) = {Pi4i(a, K(P,), w), i = 0,1, ---} converges. In 
T(x, w), the t,’s,7 = 1, 2, --- , of the successive first intersections are monotone 
increasing and bounded by ¢é . Hence, the ¢;’s are convergent, and T(z, w) con- 
verges by continuity, since X(t, w) is continuous. 

2. The limit point of the sequence 7'(z, w) is a boundary point. Assume the 
contrary, i.e., that the limit point by Step 1, say zo, is an interior point of the 
domain D. Then there exists a d > 0, where d = inf,,r;p)||z’ — zo!|. Since 
xo is the limit point for the sequence, there exists an 7 such that for all 7 > a, 
'Pisi(a, K(P;), w) — aol! < d/4. Consider any i > ip + 1. Then 


|P (xz, K(P i.) ae Xol| < d/4), 


and the maximum N-sphere about P;(z, K(P;_,), w) i.e., the sphere A(P;), has 
a radius r; = infy rp) ||P(2, K(Pi.1), w) — 2’|| > 3d/4. But by Definition 





MONTE CARLO METHODS a75 


3.1, the-point P;41(7, K(P;), w) must lie on the surface K(P;), and since 7 
| > %, the point P;,,(2, K(P,), w) must also satisfy the condition 


Pilz, K(Pi), w) — 20!) < d/4. 


This is impossible. Hence, 2 cannot be an interior point; therefore, the limit 
point lies on the boundary T'[D]. 

3. To prove that the limit point, a , of the sequence 7(z, w) must be the point 
z where the sample function X(t, w) first intersects TD], i.e., X(to,w) = z, 
we will assume the contrary. By Step 2, we know 2» ¢ r'[D]. Thus, we assume 
% ~ z and x» © T{D); further, let { be the corresponding limiting value of the 
t’s in T(z, w). If ro ¥ z, then t < tf). When the successive ¢,’s of 7T'(a, w) con 
verge to 7, the corresponding X(f;, w)’s converge to 2», and by continuity, 
aq = X(t, w). But by assumption, z was the point of first intersection of X(t, w) 
with the boundary T'[D]. Thus, this contradicts that 1 < ft, and hence 2» 
Therefore, T(x, w) must converge to z, and the proof is completed. 

With the relevant properties of the Brownian motion process taken into ac- 
count, we now exhibit the relationship between the Brownian motion process and 
the Spherical process that will be useful. We need only consider the mapping in 
one direction—namely, that to each sequence T(x, w) of the Brownian motion 
process, there is a sequence S(x, @) of the Spherical process. 

Using Theorem 3.5, remarks (1) and (2) concerning the Brownian motion 
process, and the definition of the Spherical process, an induction arguments 
vields the following desired result: 

THEOREM 3.7. Given any point x belonging to a domain D with boundary T\D}, 
then the probability distribution of the sequences of successive first intersections 


T(x, w) {Piaita, K(P;), w), t = 0,1, 2, ---}, of the sample functions of the 
srownian motion process originating from x, is the same as the probability distribu- 
tion of the sequences of the successive points S(x,o) = {Pisi(x,o),¢ = 0,1, 2, +--+.) 


of the Spherical process originating from x. 


Thus the results developed for the sequences T(a, w) can be reinterpreted for 


the Spherical process. Consequently, we shall next show that the Spherical 
process converges with probability 1 and yields the solution for any given 
Dirichlet problem. 


4. Solution of the Dirichlet problem by the Spherical process. 

THEOREM 4.1. Given any point x belonging to a domain D with boundary T{D}, 
then with probability 1, the Spherical process originating from x converges to a 
point of the boundary T{D). 

Proor. The proof will follow by applying Theorems 3.6 and 3.7. By Theorem 
3.7 we know that to each sequence T(z, w) of successive first intersections of the 
Brownian motion process, we can associate a sequence S(x, @) of the Spherical 
process. Hence, with probability 1, the sequence S(x, ¢) must converge to the 
boundary T[D}, since by Theorem 3.6, the sequences T(x, w) converge with 
probability 1 to the boundary T[D]. 





576 MERVIN E. MULLER 


‘THEOREM 4.2. Given a domain D with boundary V|Dj and an elementary set E 
of the boundary T(D), then the probability, Pr (S(x, ¢), E, D), that the Spherical 
process originating from a point x of the domain D will converge to the set E, without 
having converged to T|D| — E before it, is a harmonic function of x in D and 
limzep 2-2, Pr (S(x, ¢)E, D) = 1 or 0 according as xo is an inner point of E or 
of T[D] — E. 

ProorF. It is correct to speak of the distribution of the points of convergence on 
I'[D] of the Spherical process, since by Theorem 4.1, the Spherical process con- 
verges to ['[D] with probability 1. By Theorem 3.7 we know that the distribution 
of these points of convergence must be the same as those for the sequences 
|T (x, w), w € Q(x, T[D]})}. By use of Theorem 3.6, the distribution of the points 
of convergence of the sequences {7'(x, w), w ¢ Q(x, T[D])} to TD] is the same 
as the distribution of points of first intersection on I'[D] for the Brownian motion 
process. Hence, Theorem 3.3 then yields this theorem immediately. 

Exactly as Theorem 3.4 follows from Theorem 3.3, a similar result in light 
of Theorem 4.2 can be stated for the Spherical process. 

THEOREM 4.3. With the same conditions as given in Theorem 4.2, let f(x) be a 
real-valued continuous function defined on the boundary T({D]. Then for any point 
xo belonging to D, the value u(xo) of the solution u(x) of the Dirichlet problem for 
the domain D and the boundary value function f(x) is obtained by taking the in- 
tegral, of a Poisson type, of f(x) with respect to the kernel Pr (X(ao, ¢), E, D) on 
T'[(D] or by taking the mathematical expectation of the composed function 


f(P(ao, TD], w): 


Ritae: ; r{D}, w)) dw 


“ 


u(ao) = | Pr (S(2o, @), dx, D) f(x) 
r[D) 


5. Generalizations of the Spherical process. 

5.1. The Generalized Spherical process. An immediate generalization of the 
Spherical process would be an attempt to use spheres whose radii are not neces- 
sarily maximum. Specifically, the generalization of the Spherical process found 
in Section 2 is as follows. 

DerFIniTIon 5.1. The Generalized Spherical process. Given a domain D of N 
dimensions with boundary I[D], and any point x belonging to D + I|D]}, then 
the Generalized N-dimensional Spherical process originating from x is (x), 
where 

(A) (xz) = {S(z, ¢), 0 S  F 1}, ie., (x) is the totality of all sequences of 
points S(z, ¢), 0 S @ S 1, where 

(B) Each value of ¢ specifies a sequence of points S(z, ¢) = {Pi4,:(z, 4), 
i = 0, 1, 2,--- ,} generated according to the following stipulations: 

(1) About the point Po(z, ¢) = 2, determine an N-sphere K(P, , eo), where 
éo is the radius of the sphere and @9 = Ao(d)ro , € < Ao) S 1, for some 
positive « > 0, and ro is the radius of K(P,) of Definition 2.2: 


Select the point Pi(x, o) uniformly at random on K(P, , e); 










































MONTE CARLO METHODS 
(3) The point Pj4:(z, @) is determined recursively from P,(z, ¢), and 
K(P, , €;) in the same manner as P,(z, ¢) was determined from P,(z, ¢). 

To show that the Generalized Spherical process furnishes a method of ob- 
taining the solution for the Dirichlet problem requires only the most obvious 
restatement of the material developed for the Spherical process. The introduc- 
tion of the requirement that « < \,(¢) S 1,7 = 0,1,2,---,0 5 ¢ S 1, is to 
ensure that the process will not degenerate, i.e., converge to an interior point 
of the domain D. With this requirement, and making the obvious changes, an 
inspection of the proof of Theorem 3.6 shows the theorem to be valid in the pres- 
ent situation. 

We next consider the generalization of allowing the transitions to take place 
on surfaces other than spheres. 

5.2. The General Dirichlet Domain Process. We shall now specify the N- 
dimensional domains that will be acceptable in place of the N-dimensional 
spheres. 

DerFiniTion 5.2. An admissible domain. Given an N-dimensional domain 
D with boundary I[D], then an N-dimensional domain, say D; , with boundary 
I'[D,] is admissible with respect to any point, say P, if P ¢ D and if the follow- 
ing conditions are satisfied: 

(A) Pe D; and D; Cc D. 

(B) The normal derivative of the Green’s function for the domain D; is known 
on TD]. 

(C) The domain D;, with respect to the point P, has the property that, for 
some ¢e > 0, for every ray originating from the point P, the ratio of the distances 
along the ray from the point P to the points of intersection of the ray with T'[D,] 
and ID], respectively, is greater than e. 

DEFINITION 5.3. General Dirichlet Domain process. Given an N-dimensional] 
domain D with boundary I[D], and any point xz belonging to D + TI[D], then 
the N-dimensional General Dirichlet Domain process originating from z is 
(x), where 

(A) (xz) = {S(z, ¢),0 S @ S 1}; i.e., B(x) is the totality of all sequences 
of points S(z, ¢),0 <= 1, where 

(B) Each value of ¢ specifies a sequence of points S(z, ¢) = {P;4:(z, 9), 
i = 0, 1, 2, ---} generated according to the following stipulations: 

(1) With respect to the point Po(xz, ¢) = 2, select any admissible domain, 
say D,(¢); 
(2) Select a point P;(x, ¢) on the boundary I'[D,(¢)] of D,(¢) from the prob- 
ability distribution satisfying the condition that the probability of 
P,(x, ¢)’s being in any Lebesgue measurable subset on I'[D,(@)] is 
equal to the Lebesgue integral of the normalized normal derivative of 
the Green’s function of I'[D,(¢)] over the subset in question; 
(3) The point P;,,(z, ¢) is determined recursively from P;(z, ¢), and an 
admissible domain D,(¢) in the same manner as P;(z, ¢) was deter- 
mined from Po(z, ¢). 


518 MERVIN E. MULLER 


The General Dirichlet Domain process furnishes a method for solving the 
Dirichlet problem. This is seen by reviewing the material developed for the 
Spherical process and making the necessary minor modifications. For example, 
formerly we used the fact that the probability distribution of first intersections 
on the surface of K(P;(¢)) for the sample functions of the Brownian motion 
process originating from Po) was uniformly distributed on the surface K(P(¢)). 
We now would use the fact that the probability distribution of first intersections 
on the surface ['[D;(¢)] of an admissible domain D;(@) for sample functions of 
the Brownian motion process originating from P(@) ¢ Dd) is the same as the 
probability distribution specified in Definition 5.3 for the given point P,(@) 
and domain D;@). That the sample functions have this distribution follows 
from Theorem 3.3. With this result, and making the necessary minor modifi- 
cations in the material given for the Spherical process, the development would 
be similar to that already presented. However, in order to ensure that the proc- 
ess will not degenerate and to ensure that a result comparable to Theorem 3.6 
is attainable, requirement (C) of Definition 5.2 is used. 


6. Order of the number of steps. It is of interest to have some indication as to 
the order of the number of steps required for convergence of the N-dimensional 
Spherical process. A conservative indication can be obtained by studying first the 
convergence of the process to an infinite (V — 1)-dimensional hyperplane in N- 
space. The study will also be useful for showing that the computing effort, when 
measured in terms of the order of the number of steps required, increases ap- 
proximately as a linear function of the dimensionality of the problem. 

In this section, y; = P;(x, o), 7 0, 1, 2, --- , will denote a point in N-space 
generated by the Spherical process with I'{[D] an infinite (V — 1)-dimensional 
hyperplane, where yo = Po(x, @) = 2 is the initial starting point of the process. 
For convenience, it will be assumed that the coordinate system is so oriented 
with respect to the given hyperplane that y; also denotes the distance along a 
normal from the hyperplane to the point. From the definition of the Spherical 
process, Yi41 = Pisi(a, ) lies on the surface K(P,). Thus, with the assumed 
orientation of the hyperplane and the coordinate system, 6;,; will denote the 
direction angle between the normal of the hyperplane passing through the 
point y; = P(x, ¢) and the radius vector of the N-sphere A(P,;) at the point 
Yi+i- 

We then have 


(6.1) 


where 0 S 0; S 2x,7 = O, 1, 2, 


From symmetry, and since we shall only be interested in the distance from 


the given hyperplane, we can restrict the subsequent discussion to @ in the range 
63 ¢S ¢@. 

THeoreM 6.1. Let y; = P(x, ¢), i = 0, 1, 2, --- , be subject to the formulation 
leading to condition (6.1). Then the N-dimensional Spherical process has the prop 
erty that with the boundary T\D\ being an infinite (N | )-dimensional hyperplane 





MONTE CARLO METHODS 579 


in N-space, the expected value, E {log (yis1 / yi) | N}, i = 0, 1, 2, --+ , ts negative 
for any value N and is given as follows: 


, ‘ Yis ‘ N- 
(6.2) E2 log ” 'IN) = log 2 + v( = 


forz = 0 


Aye *** ; 


where V(z) is the psi, or digamma, function, 1.e., V(z) = d/ dz log T(z), z > 0, 
and T(z) is the gamma function. In particular, 


- , Yis Ty 
(6.3) N = 2, E< log Yi+1| — log 2, 
\ Yi 
j N—2 1 
(6.4 N odd, N = 3, E< log‘ +i NS 2 po 


Yi j=(N—1)/2 J 


’ 


N 2 
_P r , ’ Yia , ] 
(6.5 N et > 4, Es log: N> — log 2+ =. ‘ 
\ y j=N/2 J 
Proor. From the definition of the Spherical process, it is a Markov process 
where at each stage i of the process the next point P;,,:(z, ) is selected inde- 
pendently and uniformly at random on K(P,), 7 = 0, 1, 2, --- . Thus 


9 “9 
E\log (yisa / ys) | NY FE} log (yi / yo) | N} 


for 2 0, 1, 2,---. Hence, it is sufficient to consider E {log (y / yo) | N}. 
Owing to the earlier-mentioned symmetry for 6; , we need only to consider select- 
ing points uniformly at random on the surface of the appropriate N-dimensional 
hemisphere. Thus, with respect to 6; , this implies that the probability measure 
defined on the surface of the N-dimensional hemisphere is 

sin’ 6 dé 

> , 


sin*~* 6 dé 
“0 


where 0 S 6 S zm. By (6.1), 
log (1 — cos 6) sin’ 6 dé 
7 yr | -\ 
E< log — |4 
Yo 


' / ~* 9 dé 
/0 


Using 1 — cos @ = 2[sin 6/2] and then letting 


. 
sin 


u = 6/2, sin 2u = 2 sin wu cos u, 


and using formula 483 in Peirce [20] and formula 6(c) of Table 338 in Grébner 
and Hofreiter [8], we obtain 


obi = :} — W(N — 1) 


>) Yi+1 ‘ . 
E \log est log 2 + A . - 
\ Yi 





N’ 
2 





580 MERVIN E. MULLER 


Upon simplification we have the desired conclusion (6.2). We then obtain (6.3) 
from (6.2) by direct evaluation, e.g., by using Table 411 in Grébner and Hof- 
reiter [8]. 

When JN is odd and N => 3, we can use the result that 


~1 


v(t) — Wiz) = | ‘)/( — u) du 


(formula 8(b) of Table 411, Grébner and Hofreiter [8]). From formula 1(a) of 
Table 161, Grébner and Hofreiter [8], we obtain upon simplification the desired 
conclusion for (6.4). 

(6.5) is obtained by a direct inductive argument which uses that ¥(z + 1) = 
V(z) + 1/z and ¥(4) = —& — log 4, where & is Euler’s constant. With the ex- 
plicit form of the expectation being given by either (6.3), (6.4), or (6.5), we shall 
now show that the expectation is negative. We proceed as follows: First, we note 
that 
(6.6) log ox 3 < log —* 

p =p J p— | 
for p and q integers, with p < g. 

For N odd, N 2 3, we have from (6.4) and (6.6), with p = (N 
q = N — 2, that 
(6.7) log N — < log “*| NS < 0, i=0,1,2,- 

NV —: " 

Likewise, for N even, N = 4, we have from (6.5) and (6.6), with p = N/2 
and g = N — 2, that 
(6.8) log A ae oa E<log [| N) <0, }= 0,1,2,---. 

\ y 

Hence, by (6.3), (6.7), and (6.8), the expectation is negative for any finite 
N 2 2. Thus the proof is completed. 

For large values of N, the following asymptotic result will be useful. 

THEOREM 6.2. Subject to the conditions imposed on the N-dimensional Spherical 
process in Theorem 6.1, E{log (yis: / ys) | N}, 7 = 0, 1, 2,--+ , ts asymptoti- 
cally —3N for large N, i.e., 2N(E flog (yis. / ys) | N}) ~ —1. 

Proor. From Theorem 6.1, 


E < log Yi+s N+ 1> = log2+ v(2) — V(N), t= 0,1,2,--- 
Yi Z 


From Noérlund [18], page 106, we have that 


" B, 
V(z) ~ log z — Ls _ 2 oe + Romti, 


Z | 2kz? 
where 


Bom+2 


Rom 3 = 





Meade on 


MONTE CARLO METHODS 581 


and B,, is the mth Bernoulli number. Since B, = 4, we obtain 

yiss | N 4 A log 2 4 log © _ * _— = , 
N 4 » ~ log 2 + a ] J 

yi | | 12 (5 + ss) 


, a 1 1 
E N — 5 ~ TRIN + | oo oe 


Consequently, for large N, we have the desired conclusion, namely, 


E ylos 


|) 
E< Tog M2 Yai vs ww — 


, aN? $= (0,1,2 


9 =—9 


We will now use Kolmogorov’s form of the strong law of large numbers to 
obtain the following result concerning convergence. 
THEOREM 6.3. Given any « > 0, n > O there exists an no such that, subject to 
the conditions placed on the N-dimensional Spherical process in Theorem 6.1, 
] a? Yi \ | 7 
Pr<—log — + | E< log yi N>| < eforalln > m>) >1-— 1, 
i ) ee! 


on Yo 


or equivalently, 


Pr ¢ Yn < exp E ( E {log Y | |} | - c) | or alln > nop >1— 7». 


Proor. By successive use of Ro we obtain that y,/yo = TI: (1 — cos 6), 
or that 


log = = pe log (1 — cos 6;). 


Yo i=l 


Using (6.1) and dividing both sides by n yields 


(6.9) ~ log i toe 2 + log - Us + --- + log a |. 
nyo fn Yo "1 Yn-1 
Consider the right-hand side of (6.9). From the definition of the Spherical 
process and the remarks made in the proof of Theorem 6.1, the quantities log 
(yiaa / ys) = log (1 — cos 644:),4 = 0,1,2,--- ,nm — 1, are mutually independent 
and identically distributed. Further, from Theorem 6.1 we know 


E {log (yis1) / ys | N} 
fori = 0, 1, 2,--- , — 1. Then, since the expectation of the sum is the sum 
of the expectations (see for example Cramér [1], p. 173), condition (6.9) yields 
that 
fl Yn . Yi .) 
E{= log % w\ =E {log Ys+1) yh. 
n J 


Yo Yi 


Further, the expectation is bounded and nonpositive. Since log (yi4: / yi), 
1 = 0, 1, 2,---,n — 1, are mutually independent and identically distributed 





582 MERVIN E. MULLER 


with bounded and equal first moments, we are able to use Kolmogorov’s strong 
law of large numbers, (see for example Doob [4], Theorem 5.1, page 142). Hence, 
the right side of (6.9) converges to E{log y: / yo | N} with probability 1. Conse- 
quently, 1/n log yn / yo converges with probability 1 to E{log y: / yo| N}. 
Representing the strong law of large numbers in the more expressive form of the 
e, » notation, we have the desired conclusions of the theorem. 

We next consider finding the variance of the statistic 1/n log yn / yo. 

THEOREM 6.4. Subject to the conditions imposed on the N-dimensional Spherical 
process in Theorem 6.1, the variance of 1/n log yn / yo for fixed n is a monotone 
decreasing function of N and the variance is given as follows: 


(6.10) o ¢ ! log Jn N ‘= 1 lw (“= ) — w/(N ore d|, 
n Yo n 


where V'(z) is the trigamma function, i.e., the derivative of the digamma function. 

Proof. As mentioned in the proofs of Theorems 6.1 and 6.3, the quantities 
log (yisr / ys), t = 0, 1, 2,---, mn — 1, are mutually independent and identi- 
cally distributed. Hence, using (6.9), we obtain 


(6.11 o°d L log Yn nh zs EY (log n] N)>— |B < log v1 
\n J nL yo ra 


Yo n 
We already know E{log (y:/ ye)|N} from Theorem 6.1. Since 
E{ (log y: / yo)’ | N} = E{ (log (1 — cos 6)" | N}, 


the problem is reduced to finding this latter expected value, i.e. 
[log (1 — cos 6)]* sin”~* 6 dé 
E{ [log (1 — cos 6)]?| N} Timedilbempentpemmnanimatanaiinnnwen, 
| sin’? @ dé 
0 


Letting 1 — cos @ = 2[sin 6/2] and expanding, the above equals 


r OA. eg 
(2 log 2)-2 | log (sin °) sin’ ~* 6 dé 
“0 “ 


r 


sin’—? 6 dé 
0 


Pa ONT . ws 
4 toe (sin 5) | sin’ 6 d9 
/0 “a 


> 3 
[ sin’ 6 dé 
0 





From the proof of Theorem 6.1, we know that 
2 | log (sin 4 sin*~* @ dé 


40 
[ sin*—? 6 d@ 
0 








MONTE CARLO METHODS 583 
Letting u = sin 6/2 and sin 6 = 2 sin 6/2 cos 6/2 and then using formula 59, 
Table 324, of Grébner and Hofreiter [8], we find that 


, Tm 
a di’ 33 | r(3 =2}r(4 ; i) 
| toe (sin 5) je 9 ee a OF FS 


Jo T'(N — 1) 








( ; N — | = "| 


We obtain upon simplification that 





E{ {log (1 — cos 6)]’|N} = (log 2)* + 2 log oly (= a 2) + ¥(N — 1)} 


|, (N—1 , N-1 ‘ F 
+ < e (== — W(N — 1) +] ¥(——) -— WW - DD . 

Thus, using (6.11) and making straightforward simplifications yields (6.10). 
We next show that the variance for fixed n is a monotone decreasing function 
of N, ie., that o°{1/n log yn / yo| N} = 





> o {1/n log yn / yo| N + 1} for any 
N = 2. We shall have the desired result if we show that 
N- is 
Vy’ AS) = W'(N — 1) 
(6.12) — Or: , Tyg rem ~ a for any N 2 2 
V (5 om w'(N) 
Since W’(z) = 261 / (2 + r)*, (6.12) is equivalent to showing that 
s 4 < ] 
6(N — 1+ 2r)? ~ (N-14+7ry 
ile iD Se 
0o(N + 2r)? rao (N + r)? 
or that 
a2 1 eo 1 
3 7 SS ae a ee ee ee 
Pe 1 + 2r)? rad (N — 1 + 2r + 1)? 


$ 2, (N + 2r)? e 2, (N + 2r + 1)? 
or equivalently, 
= 1 + 1 ; . 1 
3 eae aya laenes’ te verre 
2, (N — 1+ 2r)? z 2 (N +1+ 2r)?~ 20 (N + 2r)? 


But since each term of each series is positive, the question is reduced to asking 
whether 


.+ - fort > 1. 
t— 1) (¢ + 1)-° f- 





584 MERVIN E. MULLER 


Straightforward algebraic calculations show this latter condition to be true. 
Hence, 


\ 
| \ 


Yn 2 ‘J y 


ao . log | N> 2 o <-log—|N + 1 for any N 2 2. 
\n Yo ) (nN Yo | 
With this result the proof is completed. 

We will now use Lindeberg’s form of the central limit theorem to obtain the 
following result, which is useful for determining asymptotic confidence intervals 
for the statistic 1/n log yn / yo. 

THEOREM 6.5. Subject to the conditions imposed on the N-dimensional Spherical 
process in Theorem 6.1, the statistic 1/n log yn/ yo + |E{log y:/ yo| N}\ ts 
asymptotically normally distributed with mean 0 and variance 


9 


o = a0{l1/n log yn / yo| N}, 


. l Yn 
lim Pr| —log Jn 
ne n Yo 
L Pe (2/2 ; : 
a. ( dt, uniformly in x. 
V2r wo 


Proor. As in the proof of Theorem 6.3, we have that 


] Un l 1 Yo : Yn 
— log — = —| log Yi iy log = + --+- + log ~ he 
Yn-1 


nm Yo nm Yo Y1 


Likewise, the quantities log yii: / y:, 7 = 0,1, --- ,m — 1, are mutually inde- 
pendent and identically distributed. From Theorems 6.1 and 6.4, we know the 
mean and variance of the quantities log y:1: / y; fort = 0,1, --- ,m — 1. Since 
the variance is finite, we can appeal to Lindeberg’s form of the central limit 
theorem for the desired conclusion of the theorem (see for example Doob [4], 
Theorem 4.3, page 140). 

Hence, by applying Theorem 6.5, we can obtain asymptotic confidence inter- 
vals for yn / yo of the form 


lim Pr< exp | =n ( E< log yi N > no) | Yn 


n> \ Yo ) Yo 


/ f fd \ |) 1 a 
S exp |" \ E< log 1 N>\- me) | > = in / "tobe 
; a ) 2m J) 


1 


.13) 


A popular conjecture exists to the effect that a Monte Carlo procedure has its 
utility increased as the dimensionality of the problem is increased. The basis 
for this belief has been that while other numerical techniques usually require an 
increase in computing labor which is an exponential function of N, a Monte 
Carlo technique should only require an increase in computing labor which is a 
linear function of N. By appealing to the material developed in this section, 





MONTE CARLO METHODS 585 


we see that in this given situation the conjecture concerning the nature of the 
increase in computing labor when using a Monte Carlo technique is true. Clearly, 
if the amount of computing labor is measured in terms of the average number of 
steps required to be arbitrarily near the boundary, then in the present situation 
the average number of steps required depends on 1 / (|E{1/n log yn / yo | N}\) = 
1 / (\Eflog y: / yo | N}\). But by Theorem 6.2, 1/(|E {log y: / yo | N'}|) is asymp- 
totically 2N. Hence, the increase in computing labor as a function of N is 
approximately linear in N. 

Using the material of this section, we obtain the following numerical results: 


nm times the ppraisal of linearity 
Asymptotic variance ‘ 
estimate for . —— - = ai 
expectation Yn | 


1 J log = 
Yo | ) 


- log 
6932 2500 3.289 817 
3068 . 1667 000 917 
1932 - .1250 540 962 
1401 1000 361 .961 
1099 — .0833 269 2.000 
0901 0743 213 973 
0765 0625 177 .O11 
0663 - 0556 151 

0587 - .0500 131 

0 0 


Thus, as N increases, we see that in the given situation the process will re- 
quire, on the average, more steps to get arbitrarily near the boundary, since 
E{1/n log yn / yo| N}| decreases with increasing N. However, the process has 
the interesting compensating feature that the variance o{1/n log yn / yo| N} 
decreases with increasing N so as to give rise to at least a more stable statistic 
for fixed n. 

We will now see that the material developed in this section for the N-dimen- 
sional Spherical process, with I'D] being an infinite (V — 1)-dimensional hyper- 
plane, can be used to yield an upper bound for the order of the average number of 
steps required when D + I[D] is any N-dimensional convex set. 

THEOREM 6.6. Let xo be any point belonging to an N-dimensional convex set 
D + T[D) and at distance dy from T{D]. Then, the order of the average number of 
steps required for convergence to T[D] by the N-dimensional Spherical process, 
defined on D + Y[D] and originating a* xo , is equal to or less than the order of the 
average number of steps required for convergence to any infinite (N — 1)-dimensional 
hyperplane by the N-dimensional Spherical process, defined with respect to this 
infinite hyperplane and originating at distance dy from this hyperplane, 1.e., 


Ate tie’ Te ta 
E<— log —|N p= By loge iN ; 
n 7 ) ' 


"o 





586 MERVIN E. MULLER 


where r;, 7 = 0, 1, 2,---, 2 — 1, denotes the successive radii of the N-dimen- 
sional Spherical process with respect to D + T[D], and yo = do denotes the distance 
to a properly oriented (N — 1)-dimensional hyperplane. 

Proor. Clearly, rn /7o = Ilia riii1 / 7; . Hence, we have 


(6.14) 1 log ln cae 1 | log ry 4 log le 4 sok S log |, 
7 


n ro n | ‘0 ry Pia2 


Let the Spherical process defined on D + TD] be at the point 2; at the 7th 
stage of the process, where 7 is arbitrary. Then, r; denotes the radius of the maxi- 
mum N-sphere K(x;) of the Spherical process defined on D + I{D]. Since 
D + T{D] is convex, it has an (N — 1)-dimensional supporting hyperplane 
at each point of TD]. Consequently, r; also denotes the distance from x; along 
a normal to an (VN — 1)-dimensional supporting hyperplane of I'[D| which is at 
a minimum distance from 2z;. With reference to this hyperplane, consider the 
Spherical process, such as discussed in Theorem 6.1, where y; = d; has the mean- 
ing attributed to it in Theorem 6.1. From Theorem 6.1, E{log y;4; / y; | N} is 
known. Further, since D + I[D] is convex, ris: S yi41. Thus, log rj,,/r; S 


log yin /Ti = log yeu / yi. Hence, Eflog rj4,/ 7; | N} exists and 
E {log risa /ri| N} S Eflog yin /yi| N} = Eflog yi / yo N 


fori = 0,1, ---,n — 1. It is therefore permissible to apply the mathematical 
expectation operator to both sides of condition (6.13) (see Cramér [1], page 
173). 

Taking the expected value of both sides of (6.14) yields that 


E{1/n log rn / ro| N} S Ef{log y: / yo| N} 


Thus, the proof is completed. 

From Theorem 6.6 we are able to use (6.13) to obtain upper bounds for the 
asymptotic confidence intervals for the average distance of the N-dimensional 
Spherical process from the boundary I'{D] when D + TI{D] is any convex set. 


7. Machine techniques. 

7.1. Generation of positions. In order to utilize either the \-dimensional 
Spherical or Gereralized Spherical process on a computer, it will be necessary to 
be able to generate positions uniformly at random on the surface of any given 
N-sphere. Actually, it is the ease with which one will be able to carry out this 
operation that makes these two classes of processes desirable. We shall restrict 
the subsequent discussion to the unit N-sphere, since there will be no resulting 
loss in generality. 

The following procedure is one way to generate points on the surface of the 
unit N-sphere uniformly at random. 

Generate N independent normal deviates x;, i = 1, 2,---, N, then corre- 
sponding to the point x = (a, 2 


2 - , tw), locate a point y on the unit N- 
. r 4: . . a ~ : ‘ 
sphere having the N-direction cosines, 2; /~/7? + 23 + --- + 27, 32 





MONTE CARLO METHODS 587 


, \. The points y obtained this way will be uniformly distributed on the 
surface of the unit N-sphere. This follows by known properties of the normal 
distribution (see for example Cramér [1], Chap. 24). The essential details are 
that the density function of x, say g(x) = g(a, 22, --- , tw), is an N-dimensional 
canonical normal density function, and it is known that the N-dimensional 
canonical normal density function has constant probability on surfaces of N- 
dimensional] spheres. It is not necessary to go into the details of how to generate 
normal deviates, since Teichroew [21] gives detailed procedures for generating 
normal deviates on high-speed computers. 

For reasons of practical feasibility and economy when using a computer, any 
of the processes discussed in this paper will require that they be modified by 
the introduction of truncation procedures. By a truncation procedure for a 
process, we mean that when the particular process being used has come within 
a prescribed distance, say 6, of the given boundary I[D], the process is termi- 
nated. We next give attention to the question of truncation. 

7.2. 5-truncation of first order. With respect to any of the three given classes 
of processes, a 5-truncation procedure of first order is given as follows: The 
first time the given process is within 6 of the boundary I[D], the process is ter 
minated and the nearest point, say y, of the boundary I'[D] is recorded as the 
point to which the process would have converged and the value f(y) is tallied. 

The question then arises as to how this truncation procedure affects the solu- 
tion of the Dirichlet problem, say at the point xz». For the sake of brevity, and 
without loss of generality, we consider only the Spherical process for the present. 
If we use the 6-truncation procedure of first order, we shall be changing the es- 
timate of Pr (S(xao, ¢), dz, D). The error introduced by this change will be 
studied from a heuristic point of view. However, we know from Section 4 that 
Pr (S(xo, ¢), dz, D) is a harmonic function in D and lim,,.p,2.+2 Pr (S(xo , $), 
dx, D) = 1 or 0, according as z is an inner point of dz or of T'|D] — dz, where 
dx represents an elementary set of T'[D]. Thus, Pr (S(xo , ¢), dz, D) is, except 
for the end points of dz, a continuous function on D + I[D]. Further, f(z) was 
given to be continuous, and thus bounded, on T[D]. Let M = maxz.ryp) |f(x)}. 
Further, the process is a Markov process with stationary independent incre- 
ments. Consequently, given an e > 0, it follows that there exists a 6 such that 
the maximum error in Pr (S(a, @), dz, D) from 6-truncation of the first order 
is <e/M. Thus, the absolute value of the difference between u(2o) and the value 
obtained by using the 6-truncation process of first order can be made less than 
e by selecting 6 small enough. 

7.3. 6-truncation of higher order. The following type of 6-truncation is sug- 
gested as a tentative procedure. This type of procedure represents an interesting 
field for further investigation. This procedure differs from the 6-truncation pro- 
cedure of first order in that once having arrived at the point, say z, within dis- 
tance 6 of the boundary, a restricted N-dimensional solid angle is selected in- 
stead of the closest boundary point. The method then consists in proceeding in 
the direction selected until T'[D] is intersected. The value of f(a) at this point of 





588 MERVIN E. MULLER 


intersection is used as the tally. For the Spherical or the General Spherical proc- 
ess, the solid angle is selected uniformly at random in a restricted region. The 
General Dirichlet Domain process would select the angle according to the nor- 
malized normal derivative of the Green’s function of the particular admissible 
domain being utilized. Regardless of which process is being used, the solid angle 
is restricted to lie within the N-dimensional hemisphere determined by an in- 
finite (V — 1)-dimensional hyperplane parallel to the supporting hyperplane 
of the boundary point nearest to the given interior point z and passing through 
z. This procedure seems worthy of study for the following reasons. 

For the infinite (V — 1)-dimensional hyperplane in N-space the 6-truncation 
procedure of first order still produces an error. However the 6-truncation proce- 
dure suggested here provides an exact solution of this problem. This follows 
from an N-dimensional generalization of the material given in Kellogg [13], 
pp. 66-69. Consequently, for a well-behaved boundary I[D] and any point 
sufficiently close to I'[D], it seems intuitively evident that the portion of the 
boundary near the point in question can be approximated reasonably well by an 
(N — 1)-dimensional hyperplane. If this is true, then clearly the 6-truncation 
procedure of higher order will contribute a smaller error than the 6-truncation 
process of first order. Other higher-order truncation procedures could utilize 
knowledge of the curvature of I'[D] when the direction is selected. Likewise, 
discrete analogs of these truncation procedures could be adopted in place of the 
classical interpolation methods being used near the boundary for discrete ap- 
proximations to the Dirichlet problem. 


Acknowledgments. The topic and much of the material of this paper is a direct 
outgrowth of suggestions given to the author by Professor George W. Brown, 
without whose help this paper would not have been possible. Professor Robert 
Steinberg’s contributions have also been exceedingly valuable. The author wishes 
to thank Professors Brown and Steinberg for their generous aid and encourage- 
ment. An enlightening conversation with Dr. T. E. Harris and a careful reading 
of an earlier version of this manuscript by Dr. D. Teichroew are appreciated. 


REFERF {CES 
] H. Cramér, Mathematical Methods of Statistics, Princeton University Press, 1946. 
] J. H. Curtiss, ‘‘SSampling methods applied to differential and difference equations,”’ 
Seminar on Scientific Computation, November, 1949, International Business Ma- 
chines Corp., New York, 1950. 
[3] J. L. Doos, “Stochastic processes depending on a continuous parameter,’’ Trans 
Amer. Math. Soc., Vol. 42 (1937), pp. 107-139. 
[4] , Stochastic Processes, John Wiley and Sons, New York, 1952. 
[5] ———, “‘Semimartingales and subharmonic functions,’’ Trans. Amer. Math. Soc., 
Vol. 77 (1954), pp. 86-121. 
[6] A. Dvorerzxy, P. Erpiés, anp 8S. KaxkutTant, ‘Double points of paths of Brownian 
motion in N-space,’’ Acta Scientiarum Mathematicarum, Vol. 12 (1950), pp. 75-81 
[7] W. Fe.uer, ‘Some recent trends in the mathematical theory of diffusion,’’ Proceedings 
of the International Congress of Mathematicians 1950, American Mathematical 
Society, Vol. 2, 1952, pp. 322-339. 


9 


“ 


(1 





MONTE CARLO METHODS 589 


W. GrROBNER AND N. Horrerter, Integraltafel, Zweiter Teil, Bestimmte Integrale, 
Springer-Verlag, Vienna, 1950. 

M. Kac, ‘On some connections between probability theory and differential and in 
tegral equations,”’ Proceedings of the Second Berkeley Symposium on Mathematical 
Statistics and Probability, University of California Press, 1951, pp. 189-215. 

| S. Kaxutani, “On Brownian motions in N-space,’’ Proc. Imp. Acad. Japan, Vol. 20 

(1944), pp. 648-652. 

———., ‘Two-dimensional Brownian motion and harmonic functions,’’ Proc. Imp. 
Acad. Japan, Vol. 20 (1944), pp. 706-714. 

O. D. Ketioaa, ‘Recent progress with the Dirichlet problem,” Bull. Amer. Math. Soc., 
Vol. 32 (1926), pp. 601-625. 

———, Fouundations of Potential Theory, Frederick Ungar, 1929. 

A. N. Kotmocorov, Foundations of the Theory of Probability, Chelsea Publishing Co.., 
1950. 

|] P. Livy, Processus Stochastiques et Mouvement Brownien, Gauthier-Villars et Cie, 
Paris, 1948. 

———,, ‘‘Le mouvement Brownien plan,’’ Amer. J. Math., Vol. 62 (1940), pp. 487-550. 

N. Merropouis anp S. Uutam, “The Monte Carlo method,”’ J. Amer. Statist. Assoc., 
Vol. 44 (1949), pp. 335-341. 

N. E. Néruiunp, Vorlesungen tiber Differenzenrechung, Julius Springer, Berlin, 1924. 

R. Patey anp N. WIENER, Fourier Transforms in the Complex Domain, American 
Mathematical Society, Colloquium Publications, Vol. 19, 1934. 

B. O. Perrce, A short Table of Integrals, Ginn and Company, 1929. 

D. Tetcurorw, Distribution Sampling with High Speed Computers, Ph.D. Thesis, Uni- 
versity of North Carolina, 1953. 


2) C. pp La Vaui&e Poussin, Les Nouvelles Méthodes de la Théorie du Potentiel et le Pro- 


bléme Généralisé de Dirichlet, Actualités Scientifiques et Industrielles, No. 578, 
Hermann et Cie, Paris, 1937. 





CONTRIBUTIONS TO THE THEORY OF RANK ORDER STATISTICS 
THE TWO-SAMPLE CASE 


By I. RicHarp SAVAGE 


National Bureau of Standards and Stanford University 


1. Introduction. The idea of a statistical test of a hypothesis and the related 
concepts introduced by Neyman and Pearson have served as a model for much 
of modern statistics. In nonparametric work it is seldom possible to apply all of 
these concepts. This results from the fact that for most of the alternatives that 
have been considered there do not exist optimum critical regions or analytic tools 
for finding power functions. The sign test gives an illustration where it is possible 
to find the exact power function; on the other hand, this procedure is seldom op- 
timum. The c¢, test [11] has optimum limiting properties but little is known about 
its power function for small samples. The Kolmogorov and Smirnov tests [6] 
have a certain intuitive appeal but their only justification is consistency. The 
Wilcoxon test [9] is justified on the basis that it is analogous to a good parametric 
procedure but has little direct justification. 

In the course of this paper we will consider several nonparametric hypotheses 
that have been treated previously. In Section 5 it will be indicated that for the 
two-sample problem with such alternatives as slippage, there do not exist op- 
timum nonparametric tests. In particular, we show that the class of admissible 
tests is too large to be of use. In Section 6 alternatives are considered involving 
monotone likelihood ratios and a necessary criterion for admissibility is given. 
In particular, two normal populations differing only in mean value are considered. 
It is shown that several of the previously proposed tests of this hypothesis sat- 
isfy this criterion. Section 7 deals with a special subclass of the alternatives used 
in Section 6. Members of this subclass are the extreme-value distribution and the 
exponential distribution. For these alternatives we not only have the results of 
the previous section on the construction of admissible tests, but also are able to 
carry out the construction of optimum nonparametric tests for small samples and 
to evaluate the operating characteristics of these tests. These small-sample tests 
are uniformly most powerful rank order tests and most stringent rank order 
tests. Also the limiting optimum test is given. 


2. Notation. The main concern in the following will be the situation where 
there are random variables X,, --- , X,, independently distributed, each with 
continuous distribution function F(x), and random variables Y;,---, Y, 
which are independent of the X’s and are independently distributed, each with 
continuous distribution G(x), i.e., two independent samples. 

The observed values 2, ,--- , 2, of the random variables X,,--- , X,, will 
be called the first sample and the observed values y;, --- , yn of the random 
variables Y,,--- , Y» will be called the second sample. When all of the ob- 
served values are ordered from smallest to largest, they form a sequence which 


Received June 1, 1954 





RANK ORDER STATISTICS 591 


will be denoted by w,, --+ , Wmin. A new sequence 2, --~ , Zm4n can be formed 
from the w sequence by letting z; = 0 if w; comes from the first sample and by 
letting z; = 1 if w; comes from the second sample (¢ = 1, --- , m + n). From 
the z sequence two other sequences are defined by the following formulas: 


i 


D> 2 


(2.1) j=1 


Uu=t— Vv; 


The ranks of the observations from the first (second) sample, denoted by n, --- , 
T'm (81, °** ,8n), are the subscripts of those z; = 0(1) arranged in increasing order. 
Corresponding to the observed values w;, 2; , Ui, ¥i, T¢, and s; are the random 
variables W;, Z;, U1, Vi, Ry, and S;. An entire sequence such as ™, --- , 
Um+n Will be denoted by the corresponding letter u without a subscript. It 
should be noted that any one of z, u,v, r, or s determines the others, and, in 
general, these sequences will be referred to as rank orders. All of the above 
quantities are uniquely defined with probability one as a result of the assump- 
tion of continuity of the original distribution functions. 
The following symbols will be used to denote special rank orders: 


I < II: To be read as “Sample J is less than Sample JJ,” i.e., all of the 
x’s are less than all of the y’s. 

I a< II: To be read as “Sample J is almost less than Sample //,”’ i.e., all 
of the z’s are less than all of the y’s, except that there is one x 
larger than one y. 


The symbols II < I and II a< I are defined analogously. Thus, when m = n = 
3, there are among others the following representations for some of the rank 
orders: 


~ v 


; <a 000111 00012: 
ITa< Il 001011 001123 
IITa<I 110100 122333 
II <I 111000 123333 


When a distribution function F(z) has a density function, it will be denoted 
by the corresponding lower case letter f(z). 


3. Hypotheses. For all testing situations considered, the following basic as- 
sumption will be made. 

Bastc AssumpTion. The random variables X,,--- , Xm, Yi,-°-, Yn 
are mutually independent. The X’s have a common continuous cumulative dis- 
tribution function F(x) and the Y’s have a common continuous cumulative dis- 
tribution function G(z). 

The null hypothesis will be 


Hy: F(x) = G(z). 





I. RICHARD SAVAGE 


The following alternatives will be treated: 


Hs (Slippage): F(x) = G(x), where the inequality holds for some z. 

Hr, (Translation): G(x) = F(a — 6), where 6 > 0. 

Hrs (Translation and Symmetry)’: G(x) = F(x — 6), where @ > O and 
F(x) + F(—z2z) B 

Hrsv (Translation, Symmetry, and Unimodal): G(r) =-F(x — 6), where 
6 > 0, F(x) + F(—z) = 1 and where b > a > O and c > O implies 
F(a+c) — F(a) = F(b +c) — F(bd). 

Hy (Monotone likelihood ratio): F(z) and G(x) have density functions 
f(x) = h(a, 6) and g(x) = h(x, 62), where if 2; < 2, and 6; < 6, then 


h(x; . 6; )h(xe . 62) — h(x, . 6.)h(2e 9 6:) = 0. 


(Lehmann): F(x) = [H(zx)}** and G(x) = [H(x)]**, where A, > A; > 0 
and H(z) is a continuous cumulative distribution function. 
(Exponential): F(x) = O(x, 41) and G(x) = @(z, As), where A; > 
A; > O and 


(e** zx<0O 


O(z, A) = < if 
1 z= 0. 


(Extreme Value): F(z) Q(x, A;) and G(x) = Q(x, Az), where A, > 
A; and 


Q(z, A) = exp [—e*]. 


(Normal): F(z) and G(x) have the density functions f(z) = N(z, 6;) 
and g(x) = N(a, 62), where 6. > 6, and 


1 .9 
N(x, 6) = an exp [—(z — 6)°/2). 


The basic assumption of continuity of the distribution functions implies that 
the occurrence of equal observations is an event with zero probability. In prac- 
tice, ties will occur and the methods of this paper will need to be modified to 
accommodate this situation. The choice of the constants in the alternative hy- 
potheses is made so that we need only consider one-sided tests. However, the 
methods of this paper can be adapted to consider the two-sided cases. 

The distribution of rank orders under Hp is not affected by the underlying dis- 
tribution function. Therefore, from the distribution theory standpoint, as far 
as rank order tests are concerned Hy may be considered a simple hypothesis. 
The alternative hypotheses can be thought of as either simple or composite. 
The interpretation used will be clear from the text. Thus, in the alternative H, 


1 The point of symmetry has been picked as the origin simply as a matter of convenience. 





RANK ORDER STATISTICS 593 


we have a simple hypothesis if F(x) and 6 are held fixed; a composite hypothesis 
if F(x) is held fixed and all @ > O are considered; a composite hypothesis if we 
consider arbitrary F(z) and all @ > 0. 

The alternative hypotheses are related in the following ways: 

1. All of the alternatives are special cases of Hs . 

2. Hrsv is a special case of Hrs which is a special case of Hr. 

. When H(z) in H, has a density function, H, is a special case of Hy . 
. Hy and Hey are special cases of H, and of Hy . 
. Hy is a special case of Hy . 

Nonparametric tests of H, against Ho will be introduced in Section 7. As a 
basis for determining the effectiveness of these procedures, their operating 
characteristics will be compared with those of the best parametric test of He 
against Hy . Since H, is a nonparametric alternative, there is no best parametric 
procedure. However, if H(x) is known and an observation z is replaced by 
In H(x), the testing situation becomes the parametric one just described. Thus 
the parametric situation serves as a basis of comparison. 


4. Construction of rank order tests for small samples. In the two-sample rank 


, m+n 
order case the sample space consists of the J = ( - 


) points or rank orders 


z'. A test consists of a sequence of numbers a, --- , a, and the rule that if the 
rank order z* occurs the null hypothesis should be rejected with probability 
a; . Since the rank orders are equally likely under the null hypothesis, the size 
of the critical region will be >> a;/J. If for each alternative hypothesis under 
consideration the rank order z’ is at least as probable as the rank order z’, then 
a necessary condition for a test to be admissible is that a; = a; . Using this as a 
criterion, it is often possible to ascertain the values of at least some of the a,’s 
in a specific problem. Unfortunately, the probabilities of the rank orders are 
seldom uniformly ordered and hence uniformly most powerful rank order tests 
seldom occur. However, the following situation does occur in practice, and we 
shall see examples of it in Sections 6 and 7. 

Let us assume that a test with level of significance K/J (where for the sake 
of simplicity we shall assume that K is an integer) is desired. Then, it is clear 
that the following rules must be followed in constructing admissible tests: If 
there are K or more rank orders always more probable than z’, then a; = 0. 
If there are J — K or more rank orders always less probable than z‘, then a; = 1. 
In general, it is not possible to determine from the criterion of admissibility 
alone the values of the remaining a’s. 


5. Slippage alternatives. In this section we consider the alternatives H, , Hr, 
Hrsg, and Hrgy introduced in Section 3. Admissible and other optimum tests 
will not be constructed. Instead, several examples will be given indicating that 
the class of admissible tests is so large it is unlikely that uniformly most powerful 
or related optimum tests exist. This does not mean that there do not exist tests 
of these hypotheses with some optimum properties. For instance, there exist 





594 I. RICHARD SAVAGE 


unbiased tests of these hypotheses (Lehmann [7|). However, there is no evi- 
dence that Lehmann’s procedure is the best unbiased test. 

A reasonable conjecture appears to be that J < JJ (the first sample is less 
than the second) is the most probable rank order under Hs : F(x) 2 G(x). In 
Section 6 it will be shown that J < JJ is the most probable rank order when two 
samples are taken from normal populations which are the same except that the 
mean of the second is larger than that of the first. Other statistically important 
examples will be given showing that Hs is compatible with J < JJ being the 
most probable rank order. 

However, Hs is not sufficient to insure that J < JJ is the most probable rank 
order. In fact, it will be shown by Example 1 that even under Hrsgy , J < I] 
need not be the most probable rank order. Here it should be recalled that H rsy 
is G(x) = F(x — 6), where 6 > 0, F(x) + F(—zx) = 1, and F(a +c) — F(a) = 
F(b + c) — F(b), where b > a > Oandec > 0. 

EXAMPLE 1. Let 


(0, 

7/2, 
f(x) = 1— 2, 

y/2, 

0, 


IA WA WA WA A 


and g(x f(x — 1). 

Let A be the rank order in which all of the observations from the first sample 
are less than all of the observations from the second sample, except that there 
is one observation from the first sample larger than all of the other observations. 
Thus, in the case that m = 4, n = 2, A is the rank order 000110. The result will 
be proved by showing that for some y, m, and n, 


(5.2) P(A) > PU < II). 

Let B be the event that all of the observations from the second sample are 
in the interval (3, $), and let B be the complement of this event. Let C; be the 
event that m — 7 observations from the first sample are less than } and the re- 
maining 7 observations from the first sample are in the interval (3, $). Let D; be 
the event that m — 7 observations from the first sample are less than }, that 7 — 1 
observations from the first sample are in the interval (4, $), and that one ob- 
servation from the first sample is in the interval ($, $). Then, 


P(A) — PU < II) 
= P(B)[P(A | B) — P(I < II | B)) + P(B)[P(A | B) — PU <11| B)) 


63) _ pcp) dd [P(AC, | B) + P(AD; | B)] — 4 PU < I-C;| B) 


1 i= 


+ P(B)[P(A | B) — PU < 17 | B)). 





RANK ORDER STATISTICS 
It is clear that 

(a) P(B) = (1 — 2y)", 

(b) P(B) = 1 — (1 — 2y)’, 

(ec) P(AC;| B) = PU < I-C;| B),t = 1,---,m, 

(d) PU < II-Co| B) = (1 — y)”, 

(e) P(AD,| B) = my(1 — y)””/2, 

(f) P(AD;| B) > 0,7 = 2,---,m. 


Hence, 


(5.4) 


P(A) — PI < I) > (- ay)" | QQ —9)"*- (1 —4"| 


+ {1 — (1 — 2y)"IIP(A | B) — PU < 11 | B)). 


Let y = k/m, hold n and k fixed, and let m — ©, then for sufficiently large m 
and k > 2, 


P(A) — PU < ID) > (5 1). 


Hence the desired result is obtained. 

While the above example is for the most restricted of the slippage alternatives 
it is only for large m. A counter example against Hs which holds for small m 
and n is 

EXAMPLE 2. Let 


WA WA WA WA A 


ijn 


Then, so long ase << 1 — m"", 


(5.8) PU a< II) — PU < I) = e"(mii -— ¢” — 1) > 0. 


When n = 1, this difference is maximized if « = {m — 1)/(m + 1), in which case 


ae m+1 
(5.9) P(I[a< II) — PZ <I) = aa ) 





596 I. RICHARD SAVAGE 

(Note: Theorem 5.1.A; states that this last result is actually the best possible.) 
Using the same distributions and letting m = n = 2, the following complete 

set of probabilities of rank orders is obtained: 


(PU < I) P01) = é&+2e1-—)0+e11—f +R, 
|PU a< II) = P(@101) = 2e(1— 6)’ + R., 

| P(o110) 2e%1—-e) +2e(1—)° +R, 

| P(1001) R., 

| PUI a< 1) = P(1010) = R., 

\PUI <I) = P(1100) = e(l1—- )* + R., 


(5.10) 


where R, = 2e(1 — ¢)*/3 + (1 — ©)*/6. 

Now then, for all ¢, the intuitively least probable rank order JJ < J has a 
greater probability than the rank orders 1001 and 1010. However, each rank 
order beginning with 1 is less probable than any rank order beginning with 0. 
Also P(0110) > P(0101) for all e. Finally, 


P(0110) — P(0011) = €(1 — 26), 
(5.11) 
P(0101) — P(O011) = &(26 — 4e + 1). 


The first of these differences is greater than 0, provided « < }, and the second 
difference is greater than 0, provided e < 1 — 1/+7/2. 

As a result of the preceding examples it is clear that under alternatives such 
as slippage the probabilities of the rank orders will not be uniformly ordered. 
The following theorem summarizes the information regarding uniform ordering 
for these alternatives. The results are meager since they are mostly for sample 
sizes that do not occur in practice. 

THEOREM 5.1. 


Ay: If n = land Hg, then P(Ia < IT) — PUI < II) S ((m — 1)/(m+1)]"™. 


Ao: If Hs, then P(I < II) > e 2 _ 


B: Ifm = 2,n = 1, and Hrs, then PUI < II) > Pla < II) > PUI < I). 
C: Ifm = 3,n = 1, and Hrsy , then PUI < II) > Pla < II) > P(UIIa < I) 
~ Rtas <2). 


Proor. These results are obtained by elementary manipulation from the 
definitions of the probabilities involved. The fact that all of the probabilities 
can be expressed as single integrals involving the c.d.f.’s is the unifying and 
simplifying feature of the statement and proof of the theorem. 

Example 3 below illustrates a situation under Hs allowing a uniform ordering 
of the probabilities of the rank orders and thus the construction of uniformly 
most powerful rank order tests is possible for all combinations of sample sizes. 

EXAMPLE 3. Let X,, --- , Xm be a sample from the rectangular distribution 
with range from 0 to 1, and let Y;, --- , Y, be an independent sample from a 
rectangular distribution with range from 0 to L (where L > 1). Then, 





RANK ORDER STATISTICS 597 


1. The probability of a rank order depends on the length of the last run of 
1’s only. 

2. The longer the last run of 1’s the more probable is the rank order. 

Let A stand for a specific rank order. Then, 


> P(A | i of the Y’s > 1) P(i of the Y’s > 1) 


i=0 


n \ <1 
> ("") LL - y*(” 5 Pa ‘) G(A), 
imO \2 m 

where G,;(A) = 1 if A can occur when there are as many asi of the Y’s > than 
all of the X’s, and otherwise G;(A) = 0. From this the results are immediate. 

Example 2, with Theorem 5.1.As, shows that Hs is sufficient for 7 < II to 
be the most probable rank order only when m = n = 1. Also, in this example, 
when m = n = 2, we have further evidence that for these alternatives the 
criterion of Section 4 for constructing optimum tests is inadequate. 

Example 1, with Theorem 5.1.C, shows that Hrsy implies that J < IJ is the 
most probable rank order only for certain m and n. The example could also be 
used for showing that there are rank orders, other than the one treated, that are 
sometimes more probable than J < IJ. Thus, even for this more restrictive 
alternative, it does not appear possible to apply the methods of Section 4. 

Example 3 is a situation under slippage where it is actually possible to con- 
struct the best test. The more common statistical situations will be discussed 


in the next two sections. For these cases, it will turn out that the hypotheses in- 
duce a partial ordering of the probabilities of rank orders which are intermediate 
between the orderings given by the examples of this section. For the alternatives 
discussed in these latter sections, the partial ordering will be adequate to give a 
useful criterion for the construction of admissible tests. Finally, in Section 7 a 
case is treated where it is possible to construct various types of best tests and 
their operating characteristics are given. 


6. Monotone likelihood ratio alternatives. In the following theorem it is shown 
that for alternatives of the monotone likelihood ratio type it is possible to give an 
easily applied necessary criterion for the admissibility of rank order tests. 

THEOREM 6.1. Jf the random variables X,,--- ,Xm, Y1,°** » Yn are mu- 
tually independent and the X’s have the density function h(x, @,) and the Y’s have 
the density function h(x, 62), where h(x; , 0:)h(ae , 02) — Alay, O2)h(re, 01) = O af 
Xe > 21, then the rank order z is more probable than z’ when the two rank orders are 
identical except for their ith and jth elements (i < 7), which are (0,1) for z and 
(1, 0) for z’. 

Proor. We have 


. m+n 
P(Z z) — P(Z = 2’) = mn! | | II h( ar Pr+2,) 


kewl 
OCF Koma no iki 


m+n 


x [h(x; , A(x; , 0) — h(x; , O)h(x; , :)) [I den 
keuwl 





598 I. RICHARD SAVAGE 


By assumption (remember, 2; < 2;) the integrand is nonnegative and actually 
positive on a set of positive measure (except for the case h(z, 4; h(x, 42) 
almost everywhere). Hence, the desired result is obtained. 

Thus, when m = n = 2, the rank order z = (0101) must be put into the 
critical region with probability one before the rank order z’ = (1001) is put into 
the critical region with nonzero probability. In the equal sample case, the one- 
sided Smirnov test [6] is based on large values of the statistic 
(6.2) max (2 — 2v;), 

ls isgm+n 
where it should be recalled that v; = a 1 2; . However, for the two rank orders 
just mentioned the Smirnov statistic has the same value, i.e., 1. Thus, the Smir- 
nov procedure could lead to the use of inadmissible tests of Ho against Hy . 

Many procedures proposed for testing Hp» against Hy are based on statistics 
of the form 


(6.3) ( 


i=l 


i*1 » 


where the c,’s are an increasing (decreasing) sequence and large (smal 
of (6.3) are critical. Some typical examples of this are 

1. The Wilcoxon statistic [9], where c; = 7 is an increasing sequence 

2. The c, statistic [11], where the coefficients c; = the expected value of the 7th 
order statistic in a sample of m + n observations from the standardized normal 
distribution form an increasing sequence. 


3. The 7’ statistic (introduced in Section 7), where ¢; = >°"*/ is a de- 
creasing sequence. 
Statistics of the form (6.3) satisfy the admissibility criterion of Theorem 6.1, for 
if rank orders z and z’ are in the desired relationship, the difference in the cor- 
responding values of the statistic will be c; — ¢c; which is positive (negative) 
when large (small) values are critical. It should be noted that (6.3) is not a suf- 
ficient condition for admissibility. 


7. Lehmann alternatives. Alternatives of the form H, were introduced by 
Lehmann [8] in order to study nonparametric procedures when the alternatives 
themselves are given in a nonparametric form. In this section we continue the 
study of these alternatives and show that for them it is possible to construct 
optimum critical regions of various types. The H, alternatives are of statistical 
interest since they include the extreme-value and exponential distributions as 
was pointed out in Section 3. 

7. a. General formulas. One of the reasons why the nonparametric treatment of 
the H, alternatives can be so complete from the Neyman-Pearson point of view 
is that it is possible to give in explicit form the probabilities of the rank orders 
This will be done in Corollary 7.a.1. 

THEOREM 7.a.1. If the random variables X;, --- , Xw are mutually independent 





RANK ORDER STATISTICS 599 


: “ : rm , A \ 
and X; has the cumulative distribution function [H(x)|"‘, where 4; > 0 and H(z) 
is a continuous distribution function, then 


P(X: 3X. S$ --- S$ Xv Xx) = (II ry) II (> 4;). 


i=l i=l 


By a proper numbering of the X’s the probability of any ordering can be found. 
Proor. Let 


(7.a.1 P= P(X,8 X:;<-:- S$ Xy-1 S Xp). 


Then, 


Ps J oe y I d(H (x,)}**. 


change of variables 


Making the 


(7.a.3 ¥; = H(2z;) 


fe have 


i=l 


**SuNS 


(ls) Jf fora 


OSuis**Sunsl 


- (Ila) /T (24). 


The following corollary is equivalent to Equation (4.5) of Lehmann {8}. 
Coro.iary 7.a.1. Under H, the probability of a rank order z is 


m'in!ATA? / II (> (1 —2)Ai +2 al) 
/ i=l j=l 


m+n 
min !6" f II (ui + v;4), 
/ 


i=] 


where 6 = Ao / dy. 


The quantity []™4" (u; + 6v,) occurring in Corollary 7.a.1 is a polynomial 
in 6, whose coefficients depend on the rank order z. For convenience, denote this 
polynomial by f,(6). The nonzero coefficients of f,(6) are positive integers. Using 
u; + v; = iand setting 6 = 1, the sum of the coefficients is found to be (m + n)!. 
If r = min (r,, -+- , tm), Le., if ris the rank of the smallest observation from 





600 I. RICHARD SAVAGE 


the first sample, then the smallest power of 6 with a nonzero coefficient is r 
In particular, if z, = 0 the polynomial has a constant term. If 


8 = min (8, --- 


’ 8n), 


1.e., if 8 is the rank of the smallest observation from the second sample, then the 
largest power of 6 with a nonzero coefficient is m + n — 8 + 1. In particular, 
if z; = 1, the polynomial is of degree m + n. If J = max (r, 8), the coefficient 
of 8° is (J — 1)! [[7t? u, and the coefficient of 8"*"-** is (J — 1)! T]™4" 2, . 
All of the nonzero coefficients of f,(6) are = m!. 

Let z be a rank order for sample sizes m and n, and let z’(z') be a rank order 
for sample sizes m + 1 and n(m and n + 1) such that the first m + n elements 
of z’(z') are the same as the elements of z and the (m + n + 1)-st element of 
2°(z') is a O(1). Then, 

{f.0(8) = [(m + 1) + nélf.(6), 
d 
| (f.1(8) = [m + (n+ 1 )d}f,(6)). 


(7.a.5) 


When two rank orders z and 2’ are identical except in their kth and k + 1-st 
elements, which are (0, 1) for z and (1, 0) for z’, then we have the following rela- 
tionship between their probabilities: 


‘ , (u; + 5; +6-—1),,, I 
-&. 2 4,=wZ = >/( ,=ez 
(7.2.6) P(Z ) a P(Z =z) 


where u; and v; are computed for z. The probability of J < JJ, all of the first 
sample less than the second, is 


(7.2.7) PU < 11) = nis" / [J (m + #8) 


t=] 


and the probability of JJ < I, all of the second sample less than the first, is 


(7.a.8) PUI <I) =m! II (i + né). 


7.b. Composite alternatives. In this section optimum tests of Ho against H, , 
where A, and A; are restricted only by 6 = A, / A; > 1, are considered. Theorem 
6.1 gives an easily applied necessary criterion for admissibility. It will be possible 
in this section to go farther and find more details about the structure of optimal 
tests than was possible in Section 4. 

The statistic T(z), or simply 7’, defined as 

m+n 


(7.b.1) T(z) = > w/t 
i=l] 
will be used in the next theorem. 7(z) will be the center of discussion of the re- 
maining subsections. 
THEOREM 7.b.1. Under H,, if T(z) < T(z’), then there exists a 6, say 5*, such 





piste" 


RANK ORDER STATISTICS 601 


that 5* > 1 and for 6 in the interval (1, 6*) the probability of z is greater than the 
probability of z’. In fact, the 6* may be chosen independently of z and z’. 
Proor. From Corollary 7.a.1, we have 


(7.b.2) P(Z =2) = — ell — (6 — 1) Te) + 06 — 1)'. 


Hence, 


P(Z = 2) — P(Z =z) 
(7-b.3) min! ’ 
= ———"_ §"(§ — 1)[T(z’) — Te) + 066 — 1)). 
(m + n)! (T(z) (z) + 0( )) 
Thus, for any z and z’ such that T(z) < T(z’), there exists a 6* > 1 such that 
P(Z = z) > P(Z = 2’) for 1 < 6 < &*; and since the number of rank orders is 
finite, 6* can be chosen independently of z and z’. This implies the theorem. 
THEOREM 7.b.2. Under H, , the rank order z will be more probable than the rank 
order z' for sufficiently large 6 if 8 > 8 or if 8 = 8 and 


m+n m+n 


Q-1!:(I[un<W—-— 2d! IJ vi. 
i=J t=J’ 

Proor. The conclusion follows immediately from the discussion after Corollary 
7.a.1, since the coefficient of the term of highest degree of a polynomial dominates 
its behavior for large values of the argument. 

Thus, in order for the rank order z to be always more probable than the rank 
order z’ under H, , it is necessary that $ and @’ satisfy the conditions of Theorem 
7.b.2. When this is the case, the necessary and sufficient condition for z to be 


, 


more probable than z’ is that the polynomial 
(7.b.4) fea'(8) = fu(d) — f.(6) 


has no (real) roots larger than 1. This results from the fact that the condition on 
(7.b.4) is equivalent to the denominator of the formula for the probability of z 
being less than the denominator of the formula for the probability of 2’, where 
these formulas are given in Corollary 7.a.1. 

Figure 1 gives relationships between probabilities of rank orders. The numbers 
in the figure are the numbers assigned to the rank orders in Table I, printed at 
the end of the text. If for i < 7 it is possible to connect i and j by a sequence 
of ascending segments, i.e., segments connecting a smaller number to a larger, 
then the rank order with number 7 is always (under H,) more probable than the 
rank order with the number j. If this is not possible, rank order 7 is more prob- 
able than j for seme 4’s, and rank order 7 is less probable than j for other 4’s. 

The diagrams in the figure were drawn using the criteria given by Theorem 6.1 
and (7.b.4). 

When the diagram corresponding to a particular combination of sample sizes 
is in the form of a simple chain, it is possible to construct a uniformly most 





I. RICHARD SAVAGE 


powerful rank order test for every level of significance. When m = 1, or m = 2 


= 4, 


and n = 2, 3, 4, or 5, uniformly most powerful rank order tests of Ho against 
H, can be formed for every level of significance. Not all cases with m = 2 give a 
simple ordering, for instance, m = 2,n = 6. 

The diagram for m = n = 3 is the least complicated one where there is not 
a simple ordering. In this case it would not be possible to construct uniformly 
most powerful rank order procedures for levels of significance in the intervals 
(0.45, 0.55) and (0.75, 0.85). Since these are unusual levels, there would be no 
practical difficulty. The diagram for m = 3 and n = 4 is like the above in that 
there does not exist a. simple ordering, and for all of the usual levels of significance 
there are uniformly most powerful procedures. 

The case of m = 3, n = 5 illustrates where the lack of simple ordering causes 
difficulty in finding optimum procedures for a reasonable level of significance, 
i.e., 0.10. Since there are 56 rank orders, a randomized test procedure at the 0.10 





RANK ORDER STATISTICS 603 
level involves the choice of probabilities a; , --- , ds. such that their sum is 5.6. 
Using the results of Section 4, we have a, = --- = a = 1, a7 = @ = 


dss = 0, ds + a6 + as = 1.6,0 S a; S 1. If as = 0.15, ag = 1, ag = 0.45, the 
most stringent rank order test is obtained. The maximum difference between the 
envelope power function of all rank order tests and this test, which has been 
minimized, is 0.0021. This maximum difference occurs when 6 = 2.2 or 16. The 
numerical work to carry out such an analysis is so large as to make it prohibitive 
except for very small samples. 

When m = n = 4 it is possible to construct uniformly most powerful rank 
order tests with levels of significance in the intervals (0, 0.043) and (0.086, 0.129). 
To obtain a test at the exact 0.05 level, we use the criterion of Section 4. We have 
then 


a&=a=a =i, de = --* = Ge = 0, ay + ds 0.5, SS a; & 1. 


The most stringent procedure is given by a, = 0.00156, and a; = 0.49844. The 
maximum deviation from the envelope power function is 0.00005, which occurs 
at 6 = 15. 

When m = 4, n = 5, it is possible to construct uniformly most powerful rank 
order tests for levels of significance in the intervals (0, 0.024) and (0.063, 0.079). 
If a test at the exact 0.05 level is desired, we have, using the results of Section 4, 


mq = & = @ = a= =— 1, = +--+ =~ Oe = 0, a tat+a=— 13,08 
a; = 1. 

When m = n = 5, there exist uniformly most powerful rank order tests with 
levels of significance in the intervals (0, 0.012) and (0.032, 0.036). If a test at the 
0.05 level is desired, we have from Section 4, a, = --- = dy = ay = dy = 1, 
ye = +++ = Age = 0, dio + Qy3 + Quy + ays = 16,0 Sa; S 1. 


It is interesting to note that we can obtain a test near the 0.05 level which 
would have only half as many rank orders, whose a,’s are not determined by the 
criterion of admissibility alone, as a test exactly at the 0.05 level. Thus, in order 
to construct a test at the 11/252 = 0.044 level, we have a; = --+ = a = ay = 
1, dig = +++ = Aye = 0, Gio + Ae = 1,0 S 6: & 1. 

This, then, completes the discussion of the construction of exact optimum rank 
order tests of Hy against H, . We have seen that for small sample sizes it is pos- 
sible to construct the uniformly most powerful rank order tests or most stringent 
rank order tests. However, the amount of computing becomes much larger as the 
sample sizes increase, and these exact methods will not be applicable for most of 
the situations arising in practice. The fact (see Table II) that most stringent 
tests for the cases examined are never much more powerful than any admissible 
test would lead to the conjecture that it is not necessary to find the best test but 
some reasonable substitute. In the next subsections we develop the theory of 
such a test. 

7.c. Exact distribution of the limiting statistic. Using the notation 


N 
(7.c.1) Dyi = Dd 


j=i 








604 I. RICHARD SAVAGE 


we have the following methods for expressing the statistic introduced before 
Theorem 7.b.1: 


m+n N 


(7.¢.2) T(z) = 7 v;/t = > 2: Dy: = 2 Dyes; 
t=) i=] i=l 

A reinterpretation of Theorem 7.b.1 shows that the locally most powerful rank 
order test of Ho against H, is based on small values of 7'(z). Using this as a moti- 
vation, the exact distribution of 7(z) under Ho will be examined in this subsec- 
tion. In the next subsection we shall examine its limiting distribution for large 
samples. 

Lemma 7.c.1. Let U = Sov" a:Z; and V = > Th" b:Z;. Then, under Ho, 
EU = n/N 3°11 a; and 

N N 
~ 
t 

mn 5( a a >, a 2, b; 


~ NAV — 1S = - 


cov (U,} 


Proor. The proof is routine, using the facts that 


cov (U,V) = Y asby var (2) + DD aay cov ( Z;Z;) 


+) 


and that under Ho, 


7 a ei a = Os i ee 
EZ; EZ; V : var Z; N? 5 and cov Zi Z; NAN ne 1) e 


THEOREM 7.¢.1. Under Ho the mean and variance of T are ET = 
o =mn/N — 1(1 — Dm /QN). 
Proor. In Lemma 7.c.1, let a; = b; = Dy; and note that int Dy; = N 


" ne on oe 
and > int Dy; = 2N — Dy. 
The Wilcoxon statistic [9], which can be written as 


N 
(7.0.3) W =). iz, 
i=l] 


is used as a test of the hypothesis that two samples come from populations dif- 
fering only in location. Hey , a special case of H, , is a hypothesis of this type. 
Thus 7 and W will sometimes be used for the same purpose. Therefore, it is 
interesting to ar some information about their joint distribution. 

THEoreM 7.c.2. Under Hg , the covariance of T and W is —mn / 4. 

Proor. In a 7.c.1, let a; = Dy; and b; = 7 and note that int iDy; = 
N(N + 3) /4. 

Coro.uary 7.c.2. Under Ho , the correlation between T and W 


es 7) ee 
-14/ (N + +) — —_ ‘ie /N) : 


or approximately —+/3/2 = —0.8660 - -- 


? 





7 — 
oO ee 7p 


RANK ORDER STATISTICS 605 


The above work is similar to the study made by Terry ({11], Section 9), where 
he gives the correlation between W and c,. The limiting correlation in the case 
considered by Terry is »/3/x (=0.9772) which is somewhat larger (in absolute 
value) than —+/3/2(= —0.8660) found in the above case. 

For each rank order z, we can form its complement rank order 2‘, i.e., if an 
element of z is 0(1), the corresponding element of z° is 1(0). Using > Dy; = N, 
we obtain T(z) + T(z‘) = N. Also available are the recursion formulas 


(7.c.4 T(z) = T(z) +n/(m+n+4+1) 
and 
(7.¢.5 T(z’) = T(z) + (n+ 1)/(m+n+ 1). 


The rank orders 2’ and z’ (used at the end of Section 7.a) are formed from z by 
placing an additional element, 0 for 2 and 1 for 2’, at the extreme right of z. 
These results are useful in preparing tables of the distribution of T under the null 
hypothesis. 

7.d. Large sample distribution of the limiting statistic. We first show that under 
H, the statistic T has a limiting normal distribution and then indicate that under 
H, it also has a normal distribution and is asymptotically most powerful. 

We need the result of Epstein and Sobel ((4], Appendix A) that if X,, --- , X» 
are independently distributed, and each X has the density function 


0 .#<0 
(7.d.1 f(x) =<, ot. > 
¢ oe = Q, 
then 
N 
(7.d.2) BXy, = 27° = Dw, i’ =N —-i+1, 


j=i’ 


where X y, is the ith order statistic in a sample of N and Dy; was introduced in 
(7.c.1). This result, combined with a theorem of Dwass [2], yields 

THEoreM 7.d.1. Under Hy , when N — @ in such a way that n / (m + n) tends 
to a constant \ different from 0 or 1, the random variable 


T — AN 
V1 — AN 
has a distribution which approaches a normal distribution with zero mean and unit 
variance. 

A rigorous treatment of the limiting distribution of 7’ under H; would be com- 
plicated and will not be given here in view of the fact that we are primarily in- 
terested in exact, instead of limiting, properties. However, it is reasonable to 
conjecture (see Dwass [1], Hoeffding [5] and Lehmann [7]) that 7’, when properly 
normalized, has a limiting distribution which is Gaussian and yields an asymp- 
totically most powerful test under alternatives of the form H, . 


8. Acknowledgment. For the continuing encouragement and guidance of Pro- 
fessor Howard Levene of Columbia University from the initiation to the com- 
pletion of this research, I wish to express sincere appreciation. 








TABLE I 
Distribution of Rank Orders under H, 


This table gives the probabilities of some of the rank orders (see Section 2 
for all combinations of sample sizes 1 S m S n S 5 and alternatives of the form 
H, (see Section 3). The rank orders have been arranged in order of increasing 
values of the statistic 7 and, hence, for values of 6 slightly greater than 1, the 
rank orders are arranged from most probable toward least probable. The value 
of 6 in the column headed P3 is that value required to obtain a test with power 
1 — 8 at the a level of significance when the best similar region test of H, is used 
(see Eisenhart [3], Chapter 8, Sections 4 and 6.2., and Tables 8.3 and 8.4). The 
values of the probabilities of the rank orders were computed using (7.a.5), 
(7.a.6) and (7.a.7). 

It should be noted that this table is not symmetric in m and n, but see the 
remarks at the end of Section 7.c. It was decided to present the results forn 2 m, 
since in this situation the rank order procedures make a more favorable compari- 
son to the parametric procedures for comparable alternatives (see Table 2). 


9.0000 7 37 361.0000 
0.5000 .9000 9972 
1.5000 1000 0048 


N = 3, 


1667 


6667 


N= 


4.1073 


0011 | ‘ | .5408 
0101 ‘ 2118 
0110 | é . 1404 
1001 : .0516 
1010 : 0342 


-0213 


16.4334 


. 8966 
0546 
.0290 
.0199 


8.4783 
. 7238 
1527 
. 0891 
.0180 
0105 


.0059 


38.4940 


9622 
0250 
.0128 


133.6569 


9889 
0074 
0037 


99.4200 


.9818 
0101 
0051 
0034 


40.8104 


9305 
0445 
0231 
0011 
0006 


0003 





at Ne 


a 


owe 


oa 


O11L11 
10111 
11011 
11101 
11110 


OO11! 
01011 
01101 
01110 
10011 


10101 
10110 
11001 
11010 
11100 


O1111 
101111 
110111 
111011 
111101 


111110 


001111 
O1o111 
011011 
011101 
011110 


100111 
101011 
101101 
101110 
110011 


110101 
110110 
111001 
111010 
111100 


TABLE I—Continued 


oe oe wm CO tO 


tS bt to 


we we GO CO CO 


or or Gr we CO 


uo 


N= 5, m=tiin 
Pp 
7 

7.0891 

7167 . 7552 
7167 1065 
. 2167 0608 
. 5500 0434 
. 8000 0341 
N = 5, m=2,.n 
3.7769 

4333 4394 
9333 1840 
. 2667 242 
5167 0963 
9333 0487 
. 2667 0329 
.5167 0255 
. 7667 0208 
0167 0161 
.3500 .0122 
N = 6, n= 1 n 
6.9826 

5500 7312 
5500 1047 
.0500 0598 
3833 0428 
6333 0336 
8333 0278 
N = 6,m = 2,n 
3.6173 

1000 3743 
6000 1621 
9333 1106 
1833 0862 
. 3833 0716 
.6000 .0448 
9333 .0305 
1833 .0238 
3833 .0198 
335 0195 
6833 0152 
8833 .0126 
.0167 .O115 
.2167 0096 
.4667 0078 


607 


4 


15.5199 


.8769 

0565 
.0301 
.0208 
.0159 


6277 


0919 
0667 
0214 
0128 
.0093 
0073 
. 0054 
.0038 


15.0031 


8615 
0574 
0306 
0211 
0162 
.0132 


6.5817 


5619 
1482 
OS9S 
0656 
.0522 
0225 
0136 
0100 
.0079 
0078 


.0058 
.0046 
.0042 
.0033 
.0026 


32.0958 


9378 
0292 
O15! 
0104 
0079 


1.0147 


.7316 
.1218 
.0688 
.0486 
O11] 


.0063 
.0044 

0034 
.0024 
.0017 


30.9851 


9297 
.0300 
.0155 

0105 
.0080 


.0064 


10.0534 


67 sé 
. 1226 
.0700 
.0497 
0388 


.0122 
.0070 
.0050 
.0039 
.0038 
0027 
.0021 
0019 
.0015 
.0011 


| 


PS 


86.3753 


9763 
.0113 
.0057 
.00388 
.0029 


27.9416 


. 8800 
0608 
0320 
.0218 
0022 
0011 
0008 
.0006 
.0004 
.0003 


9718 
.0122 
0062 
.0041 
.0031 


0025 


23,1841 


. 8397 
0694 
0369 
0253 
0193 
.0030 
.0016 
0011 
.0008 
. 0008 


0006 
0004 
0004 
-0005 
.0002 





TABLE I—Continued 
N=6,m2=3,n =3 


| 


» 10 
f 60 


3.0546 5.4436 


000111 f 2549 4270 5305 7535 
001011 483° 1513 1721 1652 111] 
001101 33% 1130 1128 1017 0613 
001110 933: 0922 0854 0746 0426 
010011 9833 0746 0534 0383 O115 


010101 2.2333 0557 0350 0236 0063 
010110 2.4333 0455 0265 0173 0044 
011001 2.5667 0396 0219 0140 0034 
011010 2.7667 0324 0166 0102 .0024 
100011 2.9833 0244 0098 0050 0006 


011100 3.0167 0259 0123 0074 OO17 
100101 3.2333 0182 0064 0031 0003 
100110 3.4333 0149 0049 0023 0002 
101001 3.5667 0129 0040 0018 0002 
101010 3.7667 0106 0031 0014 0001 


101100 0167 0085 0023 0010 0001 
110001 0667 OO86 0024 0010 0001 
110010 2667 0070 0018 0007 0001 
110100 5167 0056 0013 0005 0001 
111000 4.8500 0043 0010 0004 0000 


2518 9.512¢ 20.7442 


0011111 2.8143 3286 5136 6370 8076 
0101111 3.3143 . 1453 1416 .1212 0743 
0110111 3.6476 0997 0865 0697 0398 
0111011 3.8976 0780 0635 0496 0274 
0111101 .0976 0650 0507 .0388 0210 


0111110 4.2643 0562 .0424 .0320 .0170 
1001111 3143 0412 0226 0127 0036 
1010111 4.6476 0283 0138 0073 0019 
1011011 4.8976 0221 0102 0052 0013 
1011101 5.0976 0184 0081 0041 0010 


1100111 5.1476 0182 0080 0040 0010 
1011110 5. 2643 0160 0068 0034 0008 
1101011 5.3976 0142 0059 0029 0007 
1101101 5.5976 0118 0047 0023 0005 
1110011 5.7310 0108 0042 0020 0005 


1101110 5.7643 0103 0039 0019 0004 
1110101 5.9310 0090 0034 0016 0003 
1110110 ». 0976 0078 0028 0013 0003 
1111001 ).1810 0074 0027 0012 0002 
1111010 4}. 3476 0064 0023 0010 0002 


1111100 ». 5476 0055 0019 0008 0002 


608 





RANK ORDER STATISTICS 609 


TABLE I—Continued 
N =7,m=3,n = 4 





Pi PS Ps 

t R.O. T a 7 — 

é 

2.8968 | 4.9243 | 6.8455 14.8480 

i | | 
1 0001111 1.7214 1911 | 3436 | .4485 | .6739 
2 OO10111 2.0548 117] 1489 .1521 . 1200 
3 0011011 2.3048 O886 0996 | 0954 .0676 
{ 0011101 2.5048 .0729 .0763 0707 «| = .0475 
5 0100111 2.5548 0601 0503 | 0388 0151 

| 
6 0011110 2.6714 0627 0626 0566 | .0368 
7 0101011 2.8048 0455 .0336 0243 0085 
S 0101101 3.0048 0374 .0258 0180 0060 
9 0110011 3.1381 0328 .0214 0146 0046 
10 0101110 3.1714 0322 0211 0144 0046 
11 0110101 3.3381 .0270 0165 0108 0033 
12 0110110 3.5048 .0232 0135 0087 0025 
13 1000111 3.5548 .0207 0102 0057 0010 
14 0111001 3.5881 .0217 .0124 0079 0023 
15 0111010 3.7548 .0187 0102 0063 0018 
16 1001011 3.8048 .0157 .0068 0035 0006 
17 0111100 3.9548 .0159 0083 0050 .0014 
18 1001101 4.0048 .0129 0052 .0026 | .0004 
19 1010011 4.1381 .0113 0043 0021 .0003 
20 1001110 4.1714 0112 0043 | .0021 0003 

| 
21 1010101 4.3381 0093 0033 | .0016 | .0002 
22 1010110 4.5048 0080 0027 0013 0002 
23 1011001 4.5881 .0075 025 .0012 .0001 
24 1100011 4.6381 0076 0026 .0012 .0002 
25 1011010 4.7548 .0065 0020 .0010 .0001 
26 1100101 4.8381 .0063 .0020 | .0009 .0001 
27 1011100 4.9548 .0055 0016 .0006 .0001 
28 1100110 5.0048 .0054 .0016 .0007 .0001 
29 1101001 5.0881 .0051 .0015 .0007 .0001 
30 1101010 5.2548 0044 .0012 .0006 .0001 
31 1110001 5.4214 .0040 .0011 .0005 .0001 
32 1101100 5.4548 .0037 .0010 .0003 .0001 
33 1110010 5.5881 .0034 .0009 .0004 .0001 
< 34 1110100 5.7881 .0029 .0007 .0003 .0001 

35 


1111000 6.0381 -0024 0006 -0002 | .0000 





TABLE I—Continued 
N = 8,m = 3, 


>-10 10 
P $0 P25 


2.8029 4.6300 6.4006 13.0618 


3464 .1507 . 2872 . 3904 6126 
.6798 .0941 . 1300 1394 . 1220 
. 9298 0718 .0881 0885 .0697 
. 1298 0594 0680 .0660 0493 
1798 0495 0462 0377 0174 


00011111 
00101111 
00110111 
00111011 
01001111 


Ww wh WN bo 


00111101 3.2964 .0513 0560 .0531 0383 
01010111 3.4298 0378 0314 0239 0099 
00111110 3.4393 0455 0479 0447 0314 
01011011 | 3.6298 0312 0242 0178 0070 
01100111 3.7631 0275 0203 0145 6055 


01011101 3.7964 0270 .0199 0144 0054 
01011110 3.9393 0239 0170 0121 0045 
01101011 3.9631 .0227 .0156 0108 0039 
01101101 4.1298 .0196 .0129 0088 0030 
10001111 .1798 0177 .0100 .0059 0013 


N =8,m=4,n 


2.5893 4.2454 5.6371 11.8205 


00001111 4619 . 1056 2156 . 2966 5295 
00010111 7119 .0756 1190 . 1374 1429 
00011011 9119 0609 0854 .0928 0849 
00100111 2.0452 0494 0572 0540 0310 
00011101 0786 0519 0678 0712 0610 


00011110 
00101011 
00101101 
00110011 
01000111 


2214 0457 0568 0583 0479 
2452 0398 0410 .0365 0184 
4119 0339 0326 0280 0132 
.4952 0310 0283 0237 0106 
5452 0275 0218 .0163 0048 


bw tw Ww & bo 


00101110 2.5548 0299 0273 0229 0104 
00110101 2.6619 0264 0225 0182 0076 
01001011 2.7452 0222 .0156 .0110 0029 
00110110 2.8048 0233 0189 0149 0060 
00111001 8619 0221 0175 .0137 0054 


01001101 2.9119 0189 0124 0084 0021 
01010011 2.9952 0173 0108 0071 0017 
00111010 3.0048 0195 0147 0112 0042 
01001110 3.0548 0167 0104 .0069 0016 
01010101 3.1619 0147 0086 4 .0054 0012 


00111100 3.1714 0170 .0122 0091 0033 
01010110 3.3048 0130 0072 0045 0009 
01100011 3.3286 0128 0071 0044 0010 
01011001 3.3619 .0123 0067 0041 0009 
01100101 3.4952 0109 0057 .0034 0007 


tt ty ty be 


CU d@ W RO 


610 





‘ R.O 
26 01011010 
27 10000111 
28 01100110 
29 01011100 
30 01101001 
] 000011111 
2 000101111 
3 000110111 
4 001001111 
5 000111011 
6 000111101 
7 001010111 
8 000111110 
9 001011011 
10 001100111 
1] 010001111 
2 001011101 
13 001101011 
14 001011110 
15 010010111 
16 001101101 
17 001110011 
18 010011011 
19 001101110 
20 010100111 
21 001110101 
22 010011101 
23 001110110 
24 010101011 
25 001111001 
26 010011110 
27 001111010 
28 010101101 
29 011000111 
30 010110011 
31 010101110 
32 001111100 


TABLE I—Continued 


N = 8,m = 4,n = 4—Cont. 














611 








PS PrP? P= PS 
, joa ditch 
5 

28093 | (4.2854 3.6371 | 11.8205 

3.5048 .0109 0056 .0034 | .0007 
3.5452 .0106 .0051 .0029 .0004 
3.6381 .0097 0047 .0028 .0005 
3.6714 0095 .0047 .0027 .0005 
3.6952 .0091 0044 .0026 .0005 

N=9,m=4,n=5 
5 

2.4942 3 9646 : 5.2287 r 10.2816 

2.0175 .0751 .1645 2377 4511 
2.2675 0547 0945 1155 .1359 
2.4675 0445 0689 .0792 0824 
2.6008 .0365 0475 0479 0332 
2.6341 0382 0552 0613 0598 
2.7770 .0338 0465 .0505 0472 
2.8008 .0297 0347 .0329 .0261 
2.9020 .0305 0452 0432 0391 
2.9675 0255 0278 0254 0146 
3.0508 0233 0243 .0217 .O118 
3.1008 .0209 .0191 .0154 0059 
3.1103 0226 0234 0210 0115 
3.2175 .0200 0195 .0168 .0086 
3.2353 0204 0227 0179 .0096 
3.3008 .0170 0140 .0106 .0036 
3.3603 .0178 0164 0139 0068 
3.4175 .0168 0153 .0128 0062 
3.4675 .0146 0112 .0082 .0026 
3.4853 0160 0159 O18 | .0057 
3.5508 0133 0098 .0070 =| = .0021 
3.5603 .0150 .0129 .0106 .0049 
3.6103 .0129 0094 .0067 .0020 
3.6853 0135 0125 .0090 0041 
3.7175 .0114 0079 .0054 0015 
3.7270 0131 .0108 .0086 | .0038 
3.7353 0117 0091 0057. | .0017 
3.8520 0118 0104 .0073 .0032 
3.8603 .0101 0066 0044 .0012 
3.8841 0100 0065 0044 .0012 
3.9175 0096 0062 .0041 .0011 
3.9853 .0092 0064 .0038 .0010 
3.9948 .0106 .0090 .0062 0026 





TABLE I—Continued 





2.3226 3.6030 


0000011111 
0000101111 
0000110111 
0001001111 
0000111011 


0001010111 
0000111101 
0000111110 
0001011011 
0010001111 


0001100111 
0001011101 
0010010111 
0001101011 
0001011110 


0001101101 
0010011011 
0001110011 
0011000111 
0001101116 


.0404 .0982 . 3308 
0319 0646 .0860 . 1285 
0270 0496 0625 0820 

.0240 0391 0451 0433 
0237 0409 0498 0609 


wNonwN ee 


.0203 0300 0328 0276 
0213 0351 0418 0488 
0195 0309 0362 0409 
0179 0247 (262 0205 
0167 0209 0204 0120 


~] 


wo to tw t 


0168 0223 023 0171 
0161 0212 0220 0164 
.0141 0160 0148 .0077 
0148 0184 0185 0127 
0147 0187 0190 0138 


bo bt bw tb 


0133 0158 0155 0102 
0124 0132 0119 0057 
S980 0128 0149 0145 0093 
9218 0117 0119 0105 0047 
9675 0122 0139 0134 0086 


nN bw 


bh bd bo 


4 
_ Co 


0010011101 
0001110101 
0100001111 
0010101011 
0010011110 


0001110110 
0001111001 
0011001111 
0010101101 
0100010111 


0010110011 2313 0089 0080 0060 0026 
0001111010 3.2770 0095 0096 OO87 0050 
0010101110 3.3008 0085 0074 0061 0024 
0011001011 3.3147 0092 0071 0057 0021 
0010110101 3.3563 0080 0069 0055 0021 


9897 0112 0114 0100 0045 
0230 0115 0128 0122 0075 
0552 0101 0091 0073 0024 
0647 0103 0099 0084 0035 
1008 0102 0100 0086 0038 


a 


2 
22 
23 
2 
25 


ww Ww Ww to 


or 


1341 0106 0113 0105 0063 
1659 0103 0109 0101 0060 
1718 0094 OO85 0071 0013 
1897 0092 OO85 0070 0028 
2218 0085 0070 0053 0016 


sa ww ° 


w 


0100011011 3.3647 0075 0057 0042 0012 
0001111100 3.4020 0086 OO84 0075 0042 
0100100111 3.4218 0070 0052 0037 0010 
0011001101 3.4397 0074 0061 0047 0017 
0010110110 3.4675 0074 0061 0046 0017 


0011010011 3.4813 0071 0057 0045 0016 
0100011101 3.4897 0067 0050 0036 0009 
0010111001 3.4992 0071 0058 0046 0017 
0011001110 3.5508 0068 0053 0041 0014 
0100101011 3.5647 0064 0043 0030 0007 


0100011110 3.6008 0061 0043 0031 0008 
0011010101 3.6063 0064 0049 0038 0013 
0010111010 3.6103 0066 C051 0039 0014 
0101000111 3.6718 0057 0037 0025 0003 








RANK ORDER STATISTICS 613 


TABLE II 
Power Functions of Nonparametric Tests 
In each case, the first entry gives the power of the test based on 7’; the second 


entry gives the power of the best rank order test; the third entry gives the power 
of the test based on ¢; . 


a= .10 B= .5O 





n 
1 2 3 4 5 
| 1800 .2771 3146 3776 .4387 
1800 2771 3146 3776 4387 
1800 2771 .3146 3776 .4387 
2 2624 3245 4394 1553 .4839 
2624 3245 4394 4553 .4839 
2624 3245 .4394 .4553 .4839 
3 2387 .3667 4062 .4332 .4563 
2387 3667 4062 4332 4570 
2387 . 3667 4062 .4268 .4482 
{ 2535 3494 .3917 4289 .4398 
2535 3494 3938 4289 .4402 
2535 3494 3828 4107 .4272 
5 2642 3513 3933 4195 .4370 
2642 3513 3933 .4199 4375 
2642 3495 .3737 .3948 .4322 
a = 10 B= .25 
n 
1 2 3 | + 5 
1 1929 . 2491 . 3586 .4384 .5169 
1929 . 2491 3586 4384 .5169 
1929 2491 3586 1384 .5169 
2 2169 .4343 6377 .6360 .6638 
2169 .4343 6377 .6360 .6638 
2169 .4343 6377 6360 .6638 
3 3171 5604 5991 .6302 .6531 
3171 5604 | 5991 6302 6570 
3171 5604 5991 6173 6383 
4 3614 5420 5933 6428 .6599 
.3614 .5420 6021 6428 .6635 
.3614 .5420 .5738 6078 .6259 
5 .3983 5585 .6122 6375 .6558 
3983 5585 .6180 .6389 .6587 
3983 5349 5693 .6264 .6179 





I. RICHARD SAVAGE 


TABLE IIl—Continued 
a = 05 


1443 1896 
1443 . 1896 
. 1443 . 1896 
. 2421 - 3658 
. 2421 . 3658 
. 2421 . 3658 
.3270 5305 
.3270 -5305 
.3270 . 5305 
. 3996 .5098 
.3996 . 5098 
3996 .5098 
-4788 -5162 
.4788 .5162 
.4788 - 4884 


1 


.0997 . 148: ‘ ; 2915 
.0997 3S 2915 
.0997 148 2915 
.1478 2792 .8114 
.1478 275 8114 
.1478 . 2795 .8114 
.1941 4285 7638 .7904 
1941 428: 7904 
1941 428: .7904 
. 2389 ; ‘ : wir .8140 
. 2389 .5888 . .8213 
. 2389 . 5888 ; d .7843 
. 2823 .748 .8334 
. 2823 . 748! . 8382 
. 2823 748 : .7830 





REFERENCES 


[1] M. Dwass, ‘‘Contributions to the theory of rank order tests,’’ Dissertation, University 
of North Carolina, 1952. 
[2] M. Dwass, ‘‘On the asymptotic normality of certain rank order statistics,’’ Ann. Math 
Stat., Vol. 24 (1953), pp. 303-306. 
[3] C. Ersenuart, M. W. Hastay, ano W. A. Wats, Selected Techniques of Statistical 
Analysis, McGraw-Hill Book Co., Inc., New York, 1947. 





RANK ORDER STATISTICS 615 


[4] B. Epstein anp M. Sose., Some Tests Based on the First r Ordered Observations Drawn 
from an Exponential Distribution, Stanford University Technical Report No. 
6; Wayne University Technical Report No. 1, March 1, 1952. 
W. Hoerrpina, “The power of certain nonparametric tests,’? mimeographed notes 
Work sponsored by Office of Naval Research at the University of North Caro- 
lina, Chapel Hill, 1952. 
[6] A. N. Kotmocorov, ‘‘Confidence limits for an unknown distribution function,’’ Ann. 
Math. Stat., Vol. 12 (1941), pp. 461-463. 
[7] E. L. Lenmann, “Consistency and unbiasedness of certain nonparametric tests,’’ 
Ann. Math. Stat., Vol. 22 (1951), pp. 165-179. 
[8] E. L. Lenmann, ‘“The power of rank tests,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 23-42. 
{9} H. B. NANN AnD D. R. Wuritney, ‘“‘On a test of whether one of two random variables is 
stochastically larger than the other,’’ Ann. Math. Stat., Vol. 18 (1947), pp. 50-60. 
(10) E. J. G. Prrman, Lecture notes on nonparametric statistics, Columbia University, New 
York (1948). 
[11] M. E. Terry, ‘Some rank order tests which are most powerful against specific para 
metric alternatives,” Ann. Math. Stat., Vol. 23 (1952), pp. 346-366. 





THE ADMISSIBILITY OF HOTELLING’S 7?-TEST 


By CHARLES STEIN 
Stanford University 


1. Summary. In Section 3 we shall prove a theorem based on a method of 
A. Birnbaum [1] and E. Lehmann concerning the admissibility of certain tests 
of simple hypotheses in multivariate exponential families. In Section 4 we 
compute the supporting hyperplanes of the convex acceptance region in some of 
the most common applications of Hotelling’s T?-test and show that the theorem 
of Section 3 implies the admissibility of this test. In Section 5 we point out some 
of the limitations of the method of this paper. 


2. Introduction. We recall the definition of admissibility of a statistical test. 
Let X be a set, @ a o-algebra of subsets of X, 6 a set, Oo a nonempty proper 
subset of 6, and for each @ ¢ 0, let Ps be a probability measure on ®. We observe 
a random element X distributed in X according to P,, with @ an unknown 
element of 6, and we want to test the hypothesis Hy: @ ¢ Oo. A test isa @- 
measurable function ¢ on X to the closed interval [0, 1], with the interpretation 
that if we observe X, we reject Hp» with probability ¢(X). The test ¢o is said to 
be admissible if there does not exist a test g such that 


(1) [¢ adP,, = P for all @ € Qo, 


(2) [e dP, = ime<ce~ dh, 


with strict inequality for some 6) ¢ Qo or some 6 € © — Oo ; i.e., ag which has a 
smaller probability of error for some parameter point and does not have a larger 
probability of error for any parameter point. 

An exponential family consists of a finite-dimensional real linear space %, a 
measure » on the o-algebra @ of all ordinary Borel subsets of X, a subset 6 of 
the adjoint space X’ (the linear space of all real-valued linear functions on %) 
such that for all § ¢ 8 


(3) v@ = | e* dulz) < @, 


and P, the function on © to the set of probability measures on @ given by 


’ > ee 1 fz 
(4) P(A) = x | dy(x), 


for all A ¢ @. 
It is well known that any set of nonsingular multivariate normal distribu- 
tions in the same space can be expressed as an exponential family. If Y = 


Received December 8, 1954. 





ADMISSIBILITY OF HOTELLING’s TJ? 617 


(Y, --- Y,) is a random p-dimensional column vector (written horizontally to 
facilitate printing) normally distributed with mean yo and nonsingular co- 
variance matrix so, its density (with respect to ordinary Lebesgue measure in 
p-space) is 
1 1 _ 
DORA in OXP [-3(y — yo) 80 (y — yo)] 
. (2x)”/*(det so)! 
(5) 
exp(—4yo 80 Yo) =. ics 
= a main OXP {(yoso y — § tr [80'(yy')]}. 
(27) ”!?(det so)'/? 
We take & to be the (p + p(p + 1)/2)-dimensional real linear space of pairs 
(y, s) with s a symmetric p X p matrix and denote by (7, T) with 7 a p-di- 
mensional vector and I a symmetric p X p matrix the element of x’ defined by 


(6) (n, T)(y, 8s) = n'y —$trTs. 

Let f be the mapping of Y (the space of p-dimensional vectors) into X given by 
(7) fy) = &, yy), 

let v be ordinary Lebesgue measure in Y, and let yz, defined by 

(8) u(A) = »(f A), 


be the induced measure in X. Let © be the set of (7, T) ¢ X’ with T positive 
definite, and define P,,,r) by (4). Then (2X, uw, 6, P) is an exponential family 
and P,,.r) is the distribution of (Y, YY’) if we put 

f , —l a 

(9) [ = 8 , n = 80 Yo. 

Since the function f is 1 — 1 and preserves measurability (and is therefore 
sufficient), this shows that the set of all nonsingular normal distributions in 
p-space (and therefore any subset) is an exponential family. 


3. A theorem on admissibility for tests in exponential families. 

‘TnroreM: Let (X, u, 8, P) be an exponential family and O59 a nonempty proper 
subset of 8. Let A be a closed convex subset of X such that for every — € X’ and real 
c for which 


(10) {a:tzx > c} nA = ¢ (the emply set) 


there exists 0, ¢ © = {t: f e“du(x) < ©} such that there exist arbitrarily large 
for which 0, + XE € © — Oo. Then the test go , defined by 


au) wat? 9554 


1.€., 


(12) go(x) = 1 — xa(z), 








618 CHARLES STEIN 


is admissible for testing the hypothesis that a random element X of X is distributed 
according to some Ps, with @ € Qo against the alternatives 6 € 8 — Oo. 

The reader will observe that the theorem essentially gives conditions under 
which a test is admissible for testing any simple hypothesis in the given expo- 
nential family. However, the above statement is more convenient for the appli- 
cation we are going to make. This theorem is an extension of a result and method 
of proof which appeared in a first draft of Birnbaum [1], suggested, I believe, by 
E. Lehmann. It is related to Theorem 3 in the final version of Birnbaum’s 
paper. 

Proor. We shall suppose a test ¢ strictly better than gp and obtain a contra- 
diction. Thus, suppose 


(13) [ e@ aP.@ s [ O - xs] aP.,(2), 


(14) [ e(e) aPo(a) = f th — xa@)) aPo(a) 


for all 6) ¢ @) and @ e © — Gp, with strict inequality for at least one % ¢ Oo 
or one 6 ¢ 8 — @». By (13), 


(15) [ 1 - xa) — o)] dPa(z) = 0, 


so that, since P, and uv are mutually absolutely continuous for any @ ¢ 6 either 
(16) u{x:1 — xa(z) — g(x) ¥ 0} = 0, 

or 

(17) u{x:1 — xa(z) — o(z) > 0} > 0. 


Since (16) would imply equality everywhere in (13) and (14), it is impossible. 
But 


(18) fa:1 — xa4(x) — g(x) > O} = A’ NB, 
where A’ is the complement of A and 
(19) B = {x:g(x) < 1}. 


Cover A’ with a denumerable collection $ of open half-spaces disjoint from A. 
Then, by (17) and (18), for some half-space S = {x:tr > c} « 8, 


(20) u(A’n Bn 8S) > 0. 
By hypothesis there exist 6, ¢ 6 and arbitrarily large \ > 0 such that 


(21) 6 = 6+ rAEEO — Op. 





ADMISSIBILITY OF HOTELLING’s 7” 
Then 
/ [1 — xa(z) — ¢(zx)] dPo, (x) 


= Seay | 1 — xale) — (edhe ate) 


¥() 


_ (61) 
TC 
_ WO) 


one — ‘ Mfir—c) yp 
is (9) ; ey ll xa(x) ¢(z)] . d} 6,(Z) 


[0 = xa) — e@)le* aP,,(2) 


é / [1 — xa(z) — o(x)Je*-? dP, (zx) 


+ [ _ Ul — xalz) — o(a)le“*~° dPo,(z)>. 
{2:f2 2c} ) 


Since the first integral in the final expression approaches + © as \ — + and 
the second is bounded, this is >0 for sufficiently large \, contradicting (14). 


4. Admissibility of Hotelling’s 7?-test. Let Y, Z:,--- , Zn, U1 +--+, Un 
be independently normally distributed random p-dimensional vectors (with 
p < n) with means given by 


(23) &8Y = Yo; &Z; = 20, &U; = 0 


and common unknown, nonsingular covariance matrix s. Suppose yo and the 
zj are also unknown. Hotelling’s 7’-test for the hypothesis Ho:ye = 0 is to 
accept Hp, if and only if 


(24) Y(YY’ + UWU))"Y sc, 


where the positive constant c is chosen so as to give the desired significance 
level. We shall show that this test is admissible as a test against unrestricted 
alternatives. 

The joint probability density function of Y, Z,, --- , Zm, Ui, --: , Un 
(with respect to ordinary Lebesgue measure v in the (m + n + 1)p-dimensional 
coordinate space) is 


exp { —4}lyos0 Yo + >, 2080 zal} 
(25) (2x) =D PA(det a) SFR 
Xexp {3 tr so'[D) wus + Do 22; + yy’) + yoss'y + Do zioso'zi}- 


Thus the given family of distributions is equivalent to an exponential family 
(X, wu, 8, P). X is the [p(p + 1)/2 + (m + 1)p]-dimensional space of all 
(8, ¥, 21, °** 5 2m), Where s is a symmetric p X p matrix and y, 7, --- , Zm are 
p-dimensional vectors. The measure yp is given by 


(26) u(A) = v(f A), 





620 CHARLES STEIN 


where f is the function on the original (m + n + 1)p-dimensional space to &X, 
defined by 


oO” f , ' l / 
(27) fly, a, °° »Smy Us, ***, Un) = (> UU; + 7 252; + yy’, Y, 21, °°° 5 em 
It is convenient to designate by (T, 7, f:, --+ , &m), with Ta p X p symmetric 
matrix and 7, {1, --- , {mp-dimensional vectors, the element of the adjoint 
space X’, defined by 

7 - te , / 
(28) (T, 9, f1, °°° , Sm)(8, Y, 215 °°* > 2m) = —Strls + y+ > tz; - 
The parameter space © consists of the (T, n, i, --: , &m), with IT positive 
definite. The correspondence between this designation of the parameter point 
and that in terms of (so , yo, 210, *** , 2m) is given by 
(29) Tr = = t, = 8 2 
we ™ BB » n = So Yo, $j; = 80 20. 


P is the function given by 


1 


Pin SOM Me Acacia 
(Tat »( ) V(T 7,01, ° °° yom) 


(30) 
i exp [(, n, $1, et +, &m) (8, Y, 21, ° “+, Zm)] du(s, Y,71,°° *» fn). 
A 


In terms of the sample point in &, the acceptance region for Hotelling’s 
T’-test is 


{(8, y, Z, °°" is positive definite 


(31) 


, 


and y'(s — >> 2,2;)"y S ¢}- 


Lemma. The sel (31) is contained in the intersection of all half-spaces of the form 


’ , c k?) 
(32) {(s Yr 21, °**, 2m) n'y + Dd, kine; — 3 tr m’s S e+ 2 “> 


and those of the form 


(33) 4 (8, Yy 215 °° * ym): > kin'z; — 3 trm's S Lk >, 
\ 2 - eo 
where n ranges over the set of all p-dimensional vectors different from 0 and 
ky, +++ , km range over the real line. Also (31) differs from this intersection by a set 
of probability 0. 

We first show that any point (s, y, 21, --- , 2m) in the set (31) lies in each of 
these half-spaces. Since s — }> z,z; is positive definite, we have, by Schwarz’ 
inequality 


> kin'es S$ VG AD 23)” 


(34) , ; 
AD ka + 4 tr on’ D 22; S DK; + 4 tr ms, 


so that (33) holds. 





ADMISSIBILITY OF HOTELLING’s T? 


Since y/(s — }°z,z;)"y S c, we have, again by Schwarz’ inequality, 
n'y + Dkin'z) S Vas — Lezi)nv ys — Lees) 'y 
+ VK J/ > (n'2;)° 
bn'(s — Dozizi)n + By'(s — Doz) *y 
+ 8 DOK + 4D (n'2;)” 
< itr o's +e + Di, 


so that (32) holds. Thus, (31) is contained in the intersection of all sets of the 
form (32) and (33). 

Next, consider a point (s, y, 21, +++ , 2m) for which (32) and (33) hold for all 
nand k,,---,km. Putting k; = 7’z; in (33), we obtain 


"2 ! (n'z;)° 
> (n'z;)” — } tr m’s S Xn i) 


1.€., 
(36) tr nn'(s — >-z2,) = 0. 


Since (36) holds for all n, it follows that s — >» 2,2; is positive semidefinite, and 
thus, except for a set of probability 0, positive definite. Then by (32), with 
k; = 9'z;, 
n'y +> (n'z;)° — 4tr o's S Ble + D (n'z;)", 

so that 

ny S 3le + Str nn’ (s — 7 z;2;)). 
With 

n = (8 — D0 22) 'y, 


this becomes 
(37) y'(s — Dogz) 'y Sa + by (s — dD az), 


which shows that the given point is in the set (31). 

We return to the proof of the admissibility of Hotelling’s 7°-test. The set 
(31) is essentially the intersection of all sets of the form (32) and (33) whose 
defining relations can be rewritten in the notation of (28) 

(38) (nn, ; kyn, ot ae kmn)(s, ane tT * Zm) alc a ; ki] 
(39) (nn’, 0, kin, --: , kmn)(8, y, 21, °** 5 2m) S 4D KK. 


Thus, if & = (T, n, &,--- , &m) is any point in X’ for which 


(40) {estz2 > c} NA = 4, 





622 CHARLES STEIN 


where A is the intersection of the half-spaces mentioned in the lemma, then £ 
must be a limit of positive linear combinations of elements of the form (38) or 
(39). In particular, [ must be positive semidefinite. Consequently, for any 
parameter point, 


4(1) (1) (1) (1) 
6, = (I 7 ‘elo **.9he J 


and any > 0, the first component r™ + AP of 6 + AE must be positive defi- 
nite and, for sufficiently large A, the second component 7” + A» must be dif- 
ferent from 0. The theorem of Section 3 enables us to conclude that A is an 
admissible acceptance region. 


5. Limitations of the method. Examining the proof of the theorem of Section 
3 and the end of the proof of Section 4, we see that, in showing that for any 
test essentially different from a T°-test but of the same size, there exists an 
alternative for which the other test is worse, we have looked at parameter 
points which are arbitrarily far out, in particular, points for which 'T"'n = 
yo8o yo is arbitrarily large. How large this has to be taken depends on the test 
with which we are comparing the T°-test. This is unsatisfactory, since it leaves 
open the possibility that there is a test which is appreciably better than Hotel- 
ling’s T°-test for all values of »’I'» which are of practical importance and 
worse only where both tests have power very close to 1. 


A question which comes closer to answering our concerns in practice 
is whether Hotelling’s 7”-test is admissible for testing Ho against the class of 
. —] . . o,e mn . 
alternatives 7’I”“n = X, with A a given positive constant. The methods of this 


paper are completely inadequate for this purpose. In the case p = 1, m = 0 
(Student’s ¢-test with no unknown means as nuisance parameters), the affirma- 
tive answer is given in Lehmann and Stein [2]. For p 2 2 or m > O the answer 
is unknown. For p = 2 it is not even known whether the appropriate 7°-test is 
minimax for the problem of testing against a given »/I'n, with constant losses 
a, b > O for errors of the first and second kinds. An example by the author [4] 
shows that this does not follow from the invariance of the problem under the 
full linear group and the minimax property of Hotelling’s 7°-test among all 
invariant tests. The strongest known optimum property of Hotelling’s 7°-test 
seems to be that of Simaika [3] that of all tests whose power depends only on 
nT 'y, it is uniformly most powerful. 

Nevertheless, it is clear that, in most applications, Hotelling’s T°-test cannot 
be substantially improved upon for all 7, I with 7/T~‘y fixed. For if n/p is large, 
the test is nearly equivalent to the x*-test which results if one knows I’. This 
x’-test is of course admissible against such alternatives. The proof is essentially 
given by Wald [5]. 

For some common multivariate tests of composite hypotheses, the methods 
of the present paper give no information. For example, let Y,,--- , Y,, with 
n = p, be independently normally distributed random p-dimensional vectors 
with mean 0 and unknown nonsingular covariance matrix s . Suppose we want 





ADMISSIBILITY OF HOTELLING’s J”? 623 


~ 


to test the hypothesis that the first coordinate of the Y; is independent of the 
last (p — 1) coordinates, i.e., that so = 0, where 


$011 8012 
(41) 8 = 

S012 S022 
with so,a1 X 1 matrix. The usual test is to accept Ho if the sample multiple 
correlation coefficient is small, i.e., if 


(42) Si2Sa2 Sie S cSy ’ 


where S = Ds Y.Y; . By a calculation analogous to that of the lemma in Sec- 
tion 4, one can show that the convex cone determined by (42) differs by a set 
of probability 0 from the intersection of all half-spaces of the form 


(43) {s: — C811 + 2t' 810 — tr EE’ 809 < 0}, 


where ~ ranges over the set of all non-zero (p — 1)-dimensional vectors. How- 
ever, the matrix 


c —? 
—— ¢€ 


is not positive semidefinite for c < 1, the only case of interest. A similar argu- 
ment shows that the methods of this paper can never prove the admissibility 
of an acceptance region which is a cone when the alternative hypothesis is a 
subset of the class of normal distributions with mean 0. 


REFERENCES 


[1] A. Brrnpaum, ‘‘Characterizations of complete classes of tests of some multiparametric 
hypotheses, with applications to likelihood ratio tests,’ Ann. Math. Stat., Vol 
26 (1955), pp. 21-36. 

[2] E. LenmMann anv C. Sretn, ‘The admissibility of certain invariant statistical tests 
involving a translation parameter,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 473-9. 

[3] J. SrmarKa, “On an optimum property of two important statisiical tests,’’ Biometrika, 
Vol. 32 (1941), p. 62. 

[4] C. Stern, “On tests of certain hypotheses invariant under the full linear group,” 
(abstract), Ann. Math. Stat., Vol. 26 (1955), p. 769. 

[5] A. Watp, ‘Tests of statistical hypotheses concerning several parameters when the 
number of observations is large,’’ Trans. Amer. Math. Soc., Vol. 54 (1943), pp 
426-482. 





A METHOD OF CONSTRUCTING PARTIALLY BALANCED INCOMPLETE 
BLOCK DESIGNS 


By J. W. ArcHBOLD AND N. L. JoHNSON 


University College, London 


1. Summary. Partially balanced incomplete block designs were introduced by 
Bose and Nair [1], who described a number of methods of constructing such 
designs. Among these methods there is one based on incidence properties of 
finite geometries. This uses the finite geometries associated with the Galois field 
GF(p") with addition and multiplication (mod p). By weakening the geometrical 
structure (or, equivalently, by weakening the rules of addition and multiplica- 
tion), it is possible to obtain new designs. 

A basic feature of a finite projective geometry is that the coordinates are 
elements of a finite field. What we do here is to allow the coordinates to belong 
instead to a linear associative algebra @, of finite order n and with modulus, over 
a finite field F. The procedure is summarized below and explained with more 
detail in regard to two designs. (For accounts of a similar geometrical theory, 
using an infinite field, see [7], [8], [9].) 


2. Introduction. It is well known [6] that the elements of @ can be regularly 
represented by n X n matrices with elements in F; such matrices are here said 
to belong to @. Corresponding to the fact that @ has order n, there is a set of 
n X n matrices, U,,--- , Un, over F such that the elements of @ are repre- 


sented by those and only those matrices of the form \,U; + --- + A,U,, with 
Mi, °°: , An, in F, and the existence of a modulus means that then X n unit matrix 
U belongs to the set. 

A coordinate matrix X is a matrix of n rows and n(h + 1) columns partitioned 
into h + 1 submatrices: 


X - (Xo X; a Xn); 


where Xo, Xi, -°-- , X, belong to @. X defines a class of equivalent coordinate 
matrices, which consists of all matrices AX with A in @ and of rank n. A class 
has rank r when any (and therefore every) member has rank r. 

A projective space of dimension h and rank r over @ is a set, 8,(@), of elements 
(its points) in one-to-one correspondence with the classes of equivalent coordinate 
matrices of rank r over @. 

A set of k points, with coordinate matrices X’, --- , X* (the superscripts being 
used to distinguish between coordinate matrices), is said to be linearly dependent 
over @ when there exist matrices A; , --- , A, belonging to @ and not all 0 such 
that 


A,X'+.---+ A,X* =0. 


Received April 5, 1955. 





PBIB DESIGNS 625 


When @ is not itself a field, it is possible for two points to be distinct yet linearly 
dependent (with A; , Az both being not null). It may be noted that this kind of 
possibility does not occur in ordinary geometry. 

A prime of rank s in 8(@) consists of all points subject to a relation 
Xolo + --- + X,Ln = 0, where Lo, --- , Lx belong to @ and the matrix 


(Lo 


Ly, 
has rank s. A prime may or may not be an $,_; . Two primes in, say, an $2(@) 
can meet in more than one point. 

To obtain an incidence diagram showing which points of $,(@) lie on which 

primes of given rank s, we need a finite algebra @ and must therefore take F 
to be a finite field. We examine below the simplest cases which arise when F is a 
GF (2). 
3. Algebras of dual numbers. The simplest kinds of finite algebra with modulus 
are algebras of dual numbers over GF (2). Here, the finite groundfield has just 
the two elements 0 and 1, and the algebra is of order 2, having as a base two ele- 
ments u and e such that 


ue u, ue = eu=e, e =ae + Bu, 
with a and 8 in GF (2); u is the modulus, and there are just four elements in the 
algebra, namely, 0, u, e, and f = u + e. 
There are four cases to consider, according to the values given to a and 8: 
(i) Ifa = B = 0, = 0. We have then the parabolic dual numbers. In the 
regular matrix representation, 


, ; 1 0 : 0 1 
r+ Dag =(6 > Us = (9 a) 


The non-zero elements multiply according to the table 


2 2 


u=u, e = 0, f=u 
gf =fe=e, 
ue = eu = @, 
uf = fu =f. 
(ii) Ifa = 0,8 = 1, thene’ = wand (e + u)* = 0. The elements u and f form 


an alternative base to the algebra, which is thus seen to be isomorphic with 
(i) and therefore has nothing new for us. 


(iii) If a = 1,8 = 0, thene’ = eandf’ = (e + u)’ = f, while ef = fe = 0. 
This gives a new algebra for which 


ee oes 
aS - (4 = U: = (9 a} 


’ 





626 J. W. ARCHBOLD AND N. L. JOHNSON 
(iv) l{a = B 1, then e& = e + u. This algebra is a field, the inverses of 
u, e, f being u, f, e; it has no interest here. 


4. The parabolic case. The matrices U, F, F representing u, e, f are here 


u=(I + E=(? is r=(( 4 
0 1 0 0 0 1, 

There is just one coordinate matrix (Xo X,--- X,) of rank 0, namely, 
(00 --- 0). There are 2"** — 1 possible matrices of rank 1; no two are equivalent 
and in each of them every X; is either EZ or 0. There remain 4"°** — (2"*' — 1) — 1 
possible matrices of rank 2; these fall into pairs of equivalent matrices 
(Yo Y1--- Ya) and F(Y» YY; --- Y,). Hence, $1.(@) and $;.(@) contain, respec- 
tively, 2°*' — 1 and 3(4"*' — 2**) points. For h = 2, these numbers are 7 and 28. 

Confining our attention now to the case h = 2, the 28 points P; , --- , Pes in 
$:(@) may be assigned coordinate matrices as follows: 


1. (U00 2. (UEO) 3. (UOE) 4. (UEE) 
5. (0UG 6. QUE) 7. (EFO) 8. (EFE) 
9. (OU 10. (EEU) 11. EU) 12. (ZOU) 
13. (UU0 14. (UFE) 15. (UUE) 16. (UFO) 
17. (UU) 18. (EFF) 19. (EFU) 20. (OUF) 
21. (UUU) 22. (FUF) 23. (UFF) 24. (FFU) 
25. (U0U 26. (FOU 27. (FEF) 28. (UEF) 


Details regarding this tableau will be found in [5]. It is enough here to point 
out, as regards its structure, that any two matrices in the same row are linearly 
dependent and that any entry, say (Xo X, X-), is related to the entry (Yo Y: Y2) 
beneath it by the transformation Yo = X2, Y1 = Xo + X2, Yo = X. 

The same coordinate matrices, written as columns, and numbering, specify 
the 28 primes ™,--+ , ms of rank 2. 

Diagram 1 shows which points lie on which primes. The numbers down the 
left-hand side of the diagram can be taken as referring to the primes and the 
numbers along the top as referring to the points. If x; contains P; , the fact is 
registered by placing a cross where the row corresponding to 7; meets the column 
corresponding to P;. It will be recognized that this design is a group-divisible 
PBIB (as defined by Bose & Connor [2]; see also [3], [4]), with parameters 

28, m 7, : p= k 6, Ai = 2, Ae 

This corresponds to the fact that the primes divide into 7 sets of four and 
the primes in any one set have the property that any two of them meet in two 
distinct and yet linearly dependent points. Such a set has been called a quadri- 
lateral of rank 1. Any two primes belonging to different quadrilaterals of rank 1 
meet in just one point. 

The 28 points are divided, in a dual manner, into 7 quadrangles of rank 1. 
Any two points in one such quadrangle are joined by two primes which are 





PBIB DESIGNS 


DIAGRAM 1 
Incidence diagram for the primes of rank 2 in $3(@) 





B Cc D E F G 
$678 9 10 11 12/ 13 14 15 16| 17 18 19 20 21 22 23 24 | 25 26 27 28 
—_ _ _ - — | — | a — -_ 


x x | x 








distinct but linearly dependent. Any twe points in different quadrangles 
rank 1 are joined by just one prime. 

The two divisions of points and primes determine a subdivision of the diagram 
into 49 4 X 4 squares. Those which contain crosses contain them in the form 
of a PBIB, and there are three types of such subsidiary designs. These marked 
squares are themselves arranged as the elements in the well-known pattern 
associated with the ordinary finite projective geometry over GF (2). The patterns 
inside the 4 X 4 squares reflect the structure of the base of the algebra, while 
the pattern of the 4 X 4 squares reflects the structure of the groundfield. 





J. W. ARCHBOLD AND N. L. JOHNSON 


DIAGRAM 2 


° . » . . 2 
Incidence diagram for the primes of rank 1 in $(@) 


8 9 101112 13 14 15 16) 17 18 19 20 21 22 


There are seven primes of rank 1, 0; , --- , ¢7 in $2(@), with coordinate matrices 
Lo 


as follows 


0 E } 0 E 0 
o:10 }, 2:10 |, 3: : os:| FE}, o6:(O0], 07: 


E 0 E E 0 


Each prime of rank | consists of the 4 sides of a quadrilateral of rank 1 and 


thus contains 12 points of the space. The incidence diagram is given in Diagram 2. 

This design is, of course, a simple form of group-divisible PBIB of the type 
described by Bose and Connor [2]. 

In $2(@) there are 7 points, Q: , --- , Q; , with the coordinate matrices 

Q, : (OOF), Q. : (£00), Q; : (EEO), 
Q, : (EEE), Qs; : (OEE), Qs : (EOE), Q; : (OE0). 

There are 28 primes of rank 2 and the incidence diagram for these is obtained 
by interchanging the rows and columns in the diagram for primes of rank 1 in 
$2(@). 

Each prime of rank 1 in $3(@) contains all of the points Q,, --- , Q:. The 
incidence diagram for these primes is therefore just a 7 X 7 array of crosses. 


5. The non-parabolic case. The matrices U, EF, F representing u, e, f are now 


we i 7=(? ! it ) 
u=(5 , B=\o 4)? F=\o 0): 


There is just one coordinate matrix (X» X; --- X,) of rank 0. There is one 
family of 2"** — 1 possible matrices of rank 1, no two being equivalent, in each of 
which every X; is either F or 0; another such family is obtained by replacing Z 
everywhere by F; and there are no more primes of rank 1. There remain 
4’ —_ 2(2"7 — 1) - 1 (2"** — 1)’ possible matrices of rank 2 and no two 
are equivalent. Hence, $,(@) contains two families each of 2" — 1 points and 
$;,(@) contains (2"*' — 1) points. For h = 2, these numbers are 7 and 49. 

Again confining attention to the case h = 2, the 49 points, Pi, --- , Py, in 
$3(@) may be assigned coordinate matrices as follows: 





PBIB DESIGNS 629 


2. (UFO 3. (UOF) 4. (EFO) 5. (EFF) 6. (UFF) 7. (EOF 
.(UUF) 10. (EU0) 11. (EEF) 12. (UEF) 13. (EUF) 14. (UEO) 
16. (EUU) 17. (EEU) 18. (UEE) 19. (EUE) 20. (UEU) 21. (UUE) 
23. (FEU) 24. (FEE) 2. (FUE) 2%. @QEU) 27. QUE) 28. (FUU) 
30. (EFE) 31. (UFE) 32. (UFU) 33. (U0E) 34. (ZOU) 35. (EFU) 

(0U0 37. QEF) 38. (FUF) 39. QUF) 40. (FUO) 41. (FEO) 42. (FEF) 
3. (000 44. (FOE) 45. @FU) 46. (FOU) 47. (FFU) 48. (FFE) 49. @FE 
There are also 49 primes of rank 2, m,--- , mg, to which we may assign 
coordinate matrices according to the above scheme by writing the matrices as 
columns instead of rows. 

The array of points has been organized in the following manner. The matrix 
of products (Xo X,E X,£) is the same for all matrices (Xo X; X2) in any given 
row of the array. (We could alternatively have used F in this connection in 
place of E and so have obtained an alternative display of the points.) Then, 
also, any entry (Yo Y; Y2) in the array is derived from the entry (X» X; X2) 
immediately above it (the first row counts as following the seventh cyclically) 
by means of the homographic substitution of period 7: 


Yo = Xo + Xe 9 Y; = Xo ’ Y2 - Xi . 


Diagram 3 shows the incidences of the points and primes of rank 2. To save 
space, only the first seven rows of the design are shown here. Each further set of 
rows can be obtained by moving each mark 7 columns cyclically to the right and 
7 rows downwards (see the 28 X 28 pattern shown earlier), and repeating this 
process. In the completed design, the rows (which represent the primes) are 
numbered as follows: 


26 21 35 42 44 10 
28 20 33 37 48 13 
23 18 30 40 45 9 
24 19 34 39 47 12 
22 15 29 36 43 8 
25 16 32 41 49 11 
27 17 31 38 46 14 
Each point lies on 9 primes and each prime contains 9 points. Any given prime 


is met just once by each of 36 primes and 3 times by each of 12 primes. These 12 


DIAGRAM 3 


of the incidence diagram for points and primes of rank 2 in 82(@) 








$$ 67 8 —14 | 15 16 17 18 19 20 21 22 23 24 25 26 27 28 22—35 | 3%—42 


1 
| 





630 J. W. ARCHBOLD AND N. L. JOHNSON 


primes fall into 6 pairs, those in any pair meeting the given prime in the same 3 
points, while each point on the given prime belongs to 2 of the 6 pairs of primes. 
A corresponding dual arrangement is obtained by starting with any point. 

These designs are PBIB with two associate classes but are not group divisible. 
For these designs, 


b= po = 49, 
1 yr .. 2s ae 
Pu “0, Piz P21 
2 ‘ 2 2 
Pu * Piz = Pa 


The dual design is of identical form. 


DIAGRAM 4 
Incidence diagram for the primes of rank 1 in $3(@) 


34567 89 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 


| 


XxXXXX XX 


wn 


aor W 


: 


“ 








Pr Or ke GO be 





PBIB DESIGNS 631 


There are 14 primes of rank 1, o,,---, ou, with coordinate matrices as 


follows: 
0 E E E 0 E 
1:{E 2:1E 3:10 4:10 B: 10 : 


E E E 0 E 0 


0 F F F F 0 
8:10 9:|F 10:} 0 11:}0 12: 13:|F 14:| F 
F F F 0 0 0 


The incidence diagram is shown in Diagram 4. Each prime of rank 1 contains 
21 points to which 7 primes of rank 2 contribute equally; this may be expected 
from such identities as 


E U U U U E E E 
OJ=([OJF=(FJE=(O0OJF=(F JE=(OJE=|FIJE=(F YE. 
0 0 0 \F F F 0 F 


These 7 primes together contribute all the points of the associated prime of 
rank 1 each 3 times. 

Each prime of rank 1 is met 7 times by each of the 6 other primes with which 
it is grouped in the diagram and 9 times with each of the 7 primes of the other 
group. 

This design is a PBIB with b = 14, v = 49,r = 6, k = 21, = 4, Ae = 2, and 


ov, Piz -_ pa = 6, D2 = 30, 


2 2 2 ~ 
Ppa = Fa = 10, P22 = 25. 


DIAGRAM 5 
Incidence diagram for primes of rank 1 in $3(@) 


4 5 6 7 | 8 9 10 li 


NP Owe 


bo 


“ 





MxM aM MMM OM 
lume mM x 





| he MM MK OK 


| 
| 








632 J. W. ARCHBOLD AND N. L. JOHNSON 


Its dual (regarding columns as “blocks”? and rows as “varieties’’) is a group- 
divisible PBIB with b = 49, v = 14,r = 21,k = 6,\,; = 7, Xx = 9, and 


1 : Ra pes al 1» 
Pu = 5, Pi = pu = 0, P2 = 7, 


Pir = 0, Piz = Da = 6, D2 = 0. 


The space $2(@) contains 14 points, Q, , --- , Qa, with coordinate matrices 
1. (OEE) 2. (EEE) 3. (EOE) 4. (E00) 5. (OZ0) 6. (OF 7. (EEO 
8. (OOF) 9. (FFF) 10. (FOF) 11. (FOO) 12. OFF) 13. (FFO 14. (OFO 


The previous diagram, with rows and columns interchanged, shows the inci- 
dences between the points of this space and its 49 primes of rank 2. 

Diagram 5 shows the incidences for the primes of rank 1 (which we can take 
to be o,, --:* , o144.as above). Each prime of rank 1 contains 10 points. It is met 
8 times by each of the other 6 primes in its own set and 6 times by each of the 
7 primes in the other set. 

It is, of course, of no special interest as a PBIB design. 


REFERENCES 


{1} R. C. Bose anv K. R. Narr, “Partially balanced incomplete block designs,’’ Sankhyad, 
Vol. 4 (1939), pp. 337-372. 

[2] R. C. Bose anp W.S. Connor, ‘‘Combinatorial properties of group divisible incomplete 
block designs,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 367-383. 

[3] R. C. Boss, 8. S. SurrkHaNnpeE, AND K. N. Buartracuaryya, ‘‘On the construction of 
group divisible incomplete block designs,’’ Ann. Math. Siat., Vol. 24 (1953), 
pp. 167-195. 

[4] R. C. Bose anp T. Surmamoro, “‘Classification and analysis of partially balanced in- 
complete block designs with two associate classes,’’ J. Amer. Stat. Assn., Vol. 47 
(1952), pp. 151-184. 

[5] J. W. ArcuBo tp, ‘‘Projective geometry over an algebra,’’ Mathematika, 2 (1955) 105. 

[6] BrrkHorr AND MacLane, Survey of Modern Algebra, New York, 1948, pp. 214-217. 

[7] C. Secre, Le geometrie proiettive nei campi duali, Atti Accad. Sci. Torino, Vol. 47, 
pp. 308, 384. 

[8] N. Spampinato, Teoria delle caratteristiche in un’ algebra dotata di modulo ed S, 
ipercomplessi, Mem. R. Acc. Lincei, Ser. 6, T. 6 (1936), 24. 

[9] N. Spampinato, Sulla geometria dell’ S, biduale proiettivo, Ibid, T. 7 (1938), 30. 





A NOTE ON COMBINED INTERBLOCK AND INTRABLOCK ESTIMATION 
IN INCOMPLETE BLOCK DESIGNS'! 


By D. A. Sprorr 


1. Introduction. In some experiments it is necessary to use incomplete block 
designs in order to keep the block size small. One example is the balanced in- 
complete block design [1], in which every variety occurs in r blocks and every pair 
of varieties occurs in \ blocks. Thus it is possible to estimate a variety difference 
v; — v; in the d blocks containing both varieties, but estimates of v; — v; from 
any other blocks will be confounded with blocks. The former estimates (free 
from block effects) are the intrablock estimates denoted by 6; (or 6; — é; for the 
differences) and are obtained by minimizing 


> (yas Peng b;),, 


where y;; is the observation corresponding to variety 7 in block j and is 
an N(u + v; + b;, o°) variate. 

If the block effects b; are random N (0, o;) variates and are small, it is possible 
to extract information about variety differences from blocks which do not con- 
tain both varieties. This gives rise to interblock recovery of information and the 
interblock estimates v; discussed in [9]. The interblock estimates are obtained 
by minimizing 

Del Didyes — w — 0}, 
where the y;; are N(u + 2, at o) variates. When the recovery of inter- 
block information was first discussed [9], the interblock and intrablock estimates 


, r a i ‘ . . 

were combined to form the ‘‘best combined estimate” v; , the linear combination 
” ° ee ° ° 

of 6; and v; having minimum variance; that is, 


(1.1) 1 be(var vf) + v5 (var bi) | 


737= — 


4 ” 
var 0; + var v; 
‘ ‘ ’ . ° ° e 
The variance of v; defined in this way is 
” 4 ” 4 
(var v; )(var 6;)/(var v; + var 6;). 


However, it can be shown, [4], [8], that the best combined linear estimates 
(that is, estimates which are linear functions of the observations, are functions 
of intrablock and interblock information, and have minimum variance) can be 
found by minimizing 

< 
La 


, Bj wn , W' 
(12) WQis (vs - % + re += Di {Li (ys — w — vi)”, 


Received April 27, 1955. 
1 Done under a grant from the National Research Council of Canada. 
633 





634 D. A. SPROTT 


where W = 1/0’, W’ = 1/(ko} + o°), and d~(u is the sum of the variety effects 
of varieties contained in block b; . (Setting W’ = 0 will give the intrablock equa- 
tions and setting W = 0 will give the interblock equations.) Let the combined 
estimates found by minimizing (1.2) be denoted by v7. There have thus been 
presented two possible methods of finding the best combined estimate of »; ; 
we shall denote them by Method 1 (yielding v;) and Method 2 (yielding vf). 
Method 1 was the one first used [9] and is used in [5] and [6], all in connection 
with the balanced incomplete block design. 

It is the purpose of this paper to show that Methods 1 and 2 are not the same 
(v; ¥ vf) in general; therefore Method 1 does not of itself yield the best com- 
bined estimate. The conditions for v; = vl and (v; — v;)’ = vt — v} will be de- 
rived, and the resulting designs will be balanced or special cases of partially 
balanced incomplete block designs [3], such as group-divisible designs [2]. 


2. Formation of the estimates. Suppose that there are v varieties occurring in 
b blocks of k distinct varieties each, so that each variety occurs r times and 
varieties v; and v; occur together \;; times. The intrablock estimates #; are ob- 
tained by minimizing 
oid (ysj5 — wp — v1 — 0B) 
with respect to yw, v;, and b;, subject to the restrictions . v; = > B b; 
After eliminating the 6; and y, the resulting equations are 
c= kV; — T; = r(k =~ 1)8; —_ > Node; 
pet 
where V; is the sum of the observations containing variety v; , and 7’; is the sum 
of the block totals (B;) of all blocks containing variety v;. Since the v; sum to 
zero, Vv» can be replaced by —(v; + ve + --- + vp), the resulting equation being 


v—l1 
Gc = ir(k = 1) + Nields — a (Vin = ee a 


psi 


where c; = kV; — T; as before. This is a set of (v — 1) equations for #4, , & , 
- , 0, , and they can be written in the matric form 


C =AYF, where A = (a,;;), 
(2.1) Qi; = — (Aij — Aw); 
ay = rik — 1) + Aw. 
Similarly, the interblock estimates v; are obtained by minimizing 


> (B; — ku — > »))’, (Lv; = 0), 


@) 
where B; is the sum of the observations for block b; and >>;;)v; is the sum of the 


variety effects of varieties contained in block b; . The resulting equations are 


F ” ” 
& = 10; > Nin Vs , 


ui 





INCOMPLETE BLOCK DESIGNS 
- nm A A “* ” 
where 2; 1; — rkg, rvg - yi . Since > 0% = 0, we get, as before, 


v—1 
& = (r — Nilvs + Dd, (Nin — Avvvy, 
pei 


This is a set of (v — 1) equations for vj, v2, --- , ve-1, and they can be 
written in matric form: 

C =A s where A = (a;;), 

Gis = (iz — Aww), 

(Gis =T- Axe ° 


7° ‘ ° . + . a 
Using Method 2 the combined estimates v; are found by minimizing 


Wd (vs - ? — 1+ ap") - - > (Bj — ku — a vi)”. 
2 ¥ j (2 


The resulting equations for vt, v2, --- , v%—, can be shown to be 
(2.3) wi(c — AV*) + W'(¢ — AV*) = 0. 
Using Method 1, the best linear combination of the estimates V and V” is 


V’ = (KdV + (1 — K)V’, 


where (K;) is the diagonal matrix 


” 
( var 0; ) 
” A ? 
var vy; + var 0; 


3. Condition for the equivalence of Methods 1 and 2 for all variety estimates. 
If Methods 1 and 2 both produce the same combined estimate then 


V* = V' = (K)V + (1— KyV”. 

Substituting this expression for V* into (2.3), we get 
WiC — A(K)V — AQ — K)V"} + WC — A(K)V — A(X — KV} = 0. 
Noting that C = AV and C = AV”, this becomes 
W{AV — A(K)V — A(l — K)V”} 

+ W{Av” — A(K)V — A(l — K,)V"”} = 0; 
that is, 

[(WA(L — K,) — W’A(K)|V — [WA(QL — K,) — W’A(K)V” = 0. 


Because the intrablock estimates are statistically independent of the inter- 
block estimates, there cannot be a linear relation connecting the ¢ and the v”. 
Hence 


[(WA(L — Ki) — W’A(K)]V = 0. 





636 . A. SPROTT 
But as v; , v2, +++ , Up are linearly independent, there cannot be a linear relation 
involving 6, , t2, +--+, &~.. Therefore 

WA(1 — K,) = W’A(K)). 


Since W and W’ are scalar constants, multiplying the matrices together and 
equating the corresponding entries gives 


. amine aes 
Waj(1 — Kj) = WGK; , 
where K; and (1 — K;) are entries from the corresponding diagonal matrices; 


consequently, a;; m;@;;, Where m; is a constant depending only on 7. Sub- 
stituting the appropriate expressions from (2.1) and (2.2) for a;; and d,;; gives 


— (ray — Awe) = Mj(Aez — Ave); 
that is, 
hey = Aw (all 7) or m; = —1. 
However, we also have aj; = més; , that is, 
r(k — 1) + Aw = mr — Aw), 
and therefore m; = —1 is impossible. Hence 
(3.1) Aig = Now for all ¢ and 7. 
But 


v—1 
D din = r(k — 1) — New 
pol 


Bee 


>, Ne because of (3.1) 


pol 
Bet 


— (v es 2) Av - 
Thus A\z% = A = r(k — 1)/(v — 1) = constant for all 7. Therefore all 
Ag = r(k — 1)/@ — 1) =A, 


and the design is completely balanced. Consequently, only for balanced incom- 
° ‘ ’ + Soe 
plete blocks designs is v; = v; for all varieties. 


4. Estimates of variety differences. 

THEOREM. In an incomplete block design (v, b, r, k, Xij) a necessary and sufficient 
condition that there exist a subset of varieties v, , V2, +++ ,Va, such that (v; — v;)' = 
(vf — v*) fori, 7 = 1,2, --- , a, is that all pairs v; , v; occur together a constant 
number ,, of times, and that any other variety v, occur a constant number d, of 
times with v1, V2, -°°* , Va- 

Proor. First, equations analogous to those of Section 2 must be derived for 
the intrablock and interblock estimates of variety differences. Thus 


c r(k — 1)0; — Dorwd, ; 





INCOMPLETE BLOCK DESIGNS 


and as > Nw r(i 1), we have 


0 


Subtracting. 


a tee = Bad. 


pei 


where 6; — 0, 6;. Therefore 


r(k — 16, — &) — Dd rnd, + D Aes 
pel 


pet 


Ir(k — 1) + ad (&: — 6) — ae (Ars — Ain)dy- 
pele 


This is a set of »v — 1 equations for 6; , --~ , da-1, da41, *** Op ; they can be writ- 
ten in the partitioned matric form 


dD, Ay A» Ai 
an ( , 
De A; Ag A> 
where D, is the column vector (c; — ¢;), t = 2,3,---,4, and D, is the column 


vector (c; — c;),i = a+ 1,a+ 2, --- ,v; A, and A: are the corresponding column 
vectors of estimates; also, 


A; = (aij), where ais = — [r(k — 1) + Aad, 
1 


ayy = [r(k — 1) + Aad, 


at, = —(Auw — Ai), u,t = 2,3, --- , a, 


(note that 6 = 0); 
A, = (aij), where aj, = —(Ay — Aw); 
-,ap=at+i,at+?2,:--,v 
As = (ai;), where at; = [r(t — 1) + dal, 
Gin = —Ow — Ain), 
t=a+t+lat+2,-:--,v,u = 2,3,---, 
= (aj,), where aj; = —[r(k — 1) + dal, 
ai, = —(Am — ia); 
tp=at+l1,---,v. 
The corresponding equations for the interblock estimates can be formed: 


= ” ” 
é: = ry; + = Aine; 


pet 


” ” ’ ” 
rkvg = 4 + 7 his Dar 


uei 





D. A. SPROTT 


DV in r(k — 1). Therefore, 
7 ” a Mt 
a = rkvq = 10; + > Nin Oy » 
ux 


and 
a-G = (r = ui) (6r = 5; ) + ; (At — Nin)dy - 


pei,l 
Writing this in matric form, 


~ \ > > ” 
D, A, Ae\/Ai 
(4.2) = . 
yy > 7 ” 
Dz As Ag Ao 
where the A matrices are formed from the corresponding A matrices by replacing 
r(k — 1) by r and \,; by —),,;. Thus, for example, 
A, = (a; here @}; = —(r — Au) 
Ay a;;), where G;; (r An) 
- 
au = oe - Ari) 
— Ki), E 2, do» 


Giy (A 


ly 


also, As — As. 
Using Method (2), the combined estimates 6% satisfy the equations (analogous 
to (2.3)): 


(/D, A, A:\/dr (/Dy A, A\ (a 
(4.3) W )- >+ W’ _ j-{. a 0. 
'\D» A; A,/ \a? D, A; A,/ \a? 


, 


The theorem requires that (ve; — v;) (vt — vt) fori < a; that is, 
* , ° rs " 
AT = (Kj)A, + (1 — Kiar, 


where (K;) is the diagonal matrix 


iS _ 4 wr ” ” ” “ a 
(> — 6;) var (vy — vi) + (1 — o¢) var (& — “) 


Ms 


a a we 
var (6, — 6;) + var (vi — v;) 


This expression can be substituted into (4.3) and the matrices multiplied to- 
gether. Noting from (4.1) and (4.2) that 


D, = AA, + Ake, D, = A;A; + Ade, 
and 
D, = Ay; + Ard2, D, = A;sA; + Adz, 
and rearranging the terms as in Section 3, we get 
{WA,(1 — K,) — W’A,(K,)} (4: — 41) + WA2A, + W'A2A2 
(4.4) 
{WA;(1 — K;) — W’A;(K,)} (A: — A1) + WAddo + W’A,AD 


(4.5) 7 7 
- (WA, + W’A,) Az 





INCOMPLETE BLOCK DESIGNS 639 


These equations must hold for arbitrary W and W’. Because Az = — Az, letting 
W W’ eliminates the term A? from (4.4). Hence A, = As = 0, for otherwise 
there would be a linear relation connecting Ae and As. Equation (4.4) now has 
the form of those equations considered in Section 3; thus 


1 ~1i 
aig = MjQij. 


Combining these results gives 


Aim = Nin, = “+, Gp=at+i,at+2,-:-,t 
and 


Ay = Ay; 1, = 2,3,---, a. 


However, the choice of variety | in forming the equations involving ¢; — c; was 

arbitrary, any variety v; in the set»; , v2, --- , ¥. being possible. Consequently, 

any two varieties v; and v; from this set occur a constant number X,, of times 

together, and any other variety v, occurs a constant number \, of times with 

each of the varieties 1 , v2, +: 
SPECIAL CASEs. 


9 Vas 


Turorem. In a partially balanced incomplete block design, a necessary and 
sufficient condition for (v; — v;)’ ve — v;} for any two mth associates v; and 
v; is pl, = 0; that is, the matrix P,,, is the diagonal matrix (p7:). 

Proor. Let v; and v. be mth associates; hence they occur \,, times together. 
The ith associates of v; occur A; times with v; and therefore by the preceding 
theorem must also occur A; times with v2, and so are ith associates of v.. Thus 
the number of ith associates common to 7, and vs is the total number of 7th 
associates of », (or ve); that is, 


a 
Dii Nn; . 
Since 


” 


pi; = m(i ~ m), 2 Dmi 
j=l 


je=l 
we have Di 0 fori + $3 
Conversely, if pi; 0, then the number of 7th associates common to both 
v, and v, (where v; and » are mth associates) is n; , so that any ith associate of 
», is also an ith associate of » . This means that any v; not an mth associate 
of »; or v2 occurs A; times with »; and v. and with all other varieties in the mth 
associate class. Hence by the preceding theorem the difference between any two 
mth associates can be estimated either by Method (1) or (2). 
Coro.iary. If there are only two associate classes, the resulting design is group 
divisible 
Proor. p; O(i ~ 7). Therefore, 
hy 0 
P, = 
0 m—1, 


which is sufficient to ensure group divisibility [2]. 





640 D. A. SPROTI 


For the group-divisible design, using the usual notation [5], if v; and v; are 
second associates, var (v; — v; ) = 2kBs./W'A” = 2k/W'Ate, since Diz : 0, 
and var (6; — 6;) = 2k/WAyw.The variance of the combined estimate as found 
by Method (1) is, consequently, 


(2k/W'A%2)(2k/WAw) _ 2k 2k _ 2kBy, 
2k/W’At, + 2k/WAn WAn+ WAR Ay A’ 
which is the variance of the combined estimate as found by Method (2). It can 
easily be verified that this is not true for first-associate variety differences. 
Consider the case of four associate classes, where p;; = 0 (¢ ¥ j). Using the no 
tation of [7], and noting that in this case Ay = Ay = Au = O ({7], p. 134), 
the variance of the combined estimate found by Method (2) is 


Var (vp — v}) = 2k/Ats = 2k/(WAu + W’AK) 
where v; and v; are 4th associates, and A 1, r— 4. Thus 
Qk/Ars 2k/(WAu + W'As) 


(2k/W Ax) (2k/W'Ats) 


2k/NAy + 2k/W'AY,’ 


which is the variance of the combined estimate as found by Method (1), since 
” ” 


var (0; — 6;) 2k/WAx ((7]|, pp. 129-130), and by analogy var (v; v;) 
2k/W'Ais. 


5. Estimates of subsets of varieties. Section 4 showed that it is possible 
sometimes to obtain combined estimates of certain subsets of variety differences 
by either Method (1) or Method (2). Obviously, if Methods (1) and (2) give the 
same results for all variety differences, the design is completely balanced. This 


introduces the question as to whether there exists a design for which it is possible 
to obtain the best combined estimate either by Method (1) or Method (2) for a 
subset of varieties and not for the remaining varieties. 

Tueoreo. If, in an incomplete block design (v, b, r, k, di), v; = vt for 1,2, --- 


a, then all varieties occur the same number of times with varieties v, , V2 , 
Proor. Equation (2.3) can be written, using the methods of Section 4, 


As + 
i. = (), 
Ay] \V?/) 





INCOMPLETE BLOCK DESIGNS 641 


and VI = Vi = (KV; + (1 — K,)V{. Substituting these values back into 
the equation, we get 


[WA,(l — K,) — W’A,(K,)]V; — (WA, — Ki) — W'A\(K)]VI 

+ WAV. + W'A.V?2 — (WA, + W’'A.)V3 
[WA,(1 — K;) — W’A,(K,)]V; — [WA,(1 — Ki — W’A,(K)]V? 

+ WAV. + W’AW? — (WA, + WAV? = 0. 


(5.1) 


(5.2) 


- 


As in Section 4, these equations must hold for all W and W’, and hence must be 
true for W W’. When this is so, the term in V? disappears in (5.1), since 
A» — A,. But ¥:. Ve. V3, and Vz: cannot be linearly related, and so 
A, = A, = 0; because these matrices do not contain W or W’, they must there- 
fore always be 0. Then 


> 


WA,(1 — K,) = W’A,(K,), 
and the same arguments as those used in Sections 3 and 4 can be applied. Thus 


Ais = Nw, L 12,---,a4jg=at+la+2,- v, 


’ 


and 
hij = Aw, $= 1,2)°* ,@59 = 1,2; ++ 5a 


This means that any variety v, occurs a constant number of times with 2, , 
Vo, *** , Ug, Since we can show (as at the end of Section 3) that \;,, = constant 
fori S a. 


Acknowledgment. The author would like to thank Prof. B. A. Griffith for 
many helpful criticisms and suggestions. 


REFERENCES 
.C. Boss, ‘‘On the construction of balanced incomplete block designs,’’ Ann. Eugenics, 
Vol. 10 (1939), pp. 353-399. 
. C. Bose anv W. 8. Connor, ‘“‘Combinatorial properties of group-divisible incomplete 
block designs,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 367-383. 
C. Bose anv K. R. Narr, “‘Partially balanced incomplete block designs,’’ Sankhya, 
Vol. 4 (1938), pp. 337-372. 
’. G. CocuHran Anp G. M. Cox, Experimental Designz, John Wiley & Sons, Inc., New 
York, 1950, p. 265. 
. Kemptuorne, The Design and Analysis of Experiments, John Wiley & Sons. Inc., 
New York, 1952, pp. 534-535. 
. B. Mann, Analysis and Design of Experiments, Dover Publications, New York, 1949, 
pp. 172-177. 
\7] K. R. Narr, ‘‘Analysis of partially balanced incomplete block designs illustrated on the 
simple square and rectangular lattices,’’ Biometrics, Vol. 8 (1952), pp. 122-155. 
[8] C. R. Rao, “General methods of analysis for incomplete block designs,’’ J. Amer. Stat 
Assn., Vol. 42 (1947), pp. 541-561. 
{9} F. Yares, ‘‘The recovery of interblock information in balanced incomplete block de 
signs,’’ Ann. Eugenics, Vol. 10 (1940), pp. 317-325. 





ASYMPTOTIC MINIMAX CHARACTER OF THE SAMPLE DISTRIBUTION 
FUNCTION AND OF THE CLASSICAL MULTINOMIAL ESTIMATOR 


By A. Dvoretzxy,! J. Kierer,' anp J. WoLFrowi1Tz? 
Cornell University 


0. Summary. This paper is devoted, in the main, to proving the asymptotic 
minimax character of the sample distribution function (d.f.) for estimating an 
unknown d.f. in F or F, (defined in Section 1) for a wide variety of weight func- 
tions. Section 1 contains definitions and a discussion of measurability considera- 
tions. Lemma 2 of Section 2 is an essential tool in our proofs and seems to be of 
interest per se; for example, it implies the convergence of the moment generating 
function of G, to that of G (definitions in (2.1)). In Section 3 the asymptotic 
minimax character is proved for a fundamental class of weight functions which 
are functions of the maximum deviation between estimating and true d.f. In 
Section 4 a device (of more general applicability in decision theory) is employed 
which yields the asymptotic minimax result for a wide class of weight functions 
of this character as a consequence of the results of Section 3 for weight functions 
of the fundamental class. In Section 5 the asymptotic minimax character is 
proved for a class of integrated weight functions. A more general class of weight 
functions for which the asymptotic minimax character holds is discussed in 
Section 6. This includes weight functions for which the risk function of the sample 
d.f. is not a constant over $.. Most weight functions of practical interest are 
included in the considerations of Sections 3 to 6. Section 6 also includes a dis- 
cussion of multinomial estimation problems for which the asymptotic minimax 
character of the classical estimator is contained in our results. Finally, Section 7 
includes a general discussion of minimization of symmetric convex or monotone 
functionals of symmetric random elements, with special consideration of the 
“tied-down” Wiener process, and with a heuristic proof of the results of Sections 
3, 4, 5, and much of Section 6. 


1. Introduction and Preliminaries. Throughout this paper we shall denote by 
§ the class of all univariate d.f.’s and by §, the subclass of continuous members 
of ¥ (for the sake of definiteness, members of ¥ will be considered continuous on 
the right). Let R" denote n-dimensional Euclidean space, and let G be any sub- 
space of the space of all real-valued functions on R’. For simplicity we assume 
§ C G, although it is really only necessary that G contain the function S,, 
defined below, for every x‘”. Let B be the smallest Borel field on G such that 
every element of F is an element of B and such that, for every positive integer 
k: and all sets of real numbers {t; , --- ,&} and fa,,--- ,a} witht, <t < = 


Ky «© 


Received May 31, 1955. Revised October 5, 1955. 

tesearch sponsored by the Office of Naval Research. 
* The research of this author was supported in part by the United States Air Force under 
Contract No. AF18(600)-685 monitored by the Office of Scientific Research. 


642 





ASYMPTOTIC MINIMAX CHARACTER 643 


t, , the set {g |g € G; g(h) < a, ---, g(t.) < ay} isin B. (Thus, we might have 
G = § and B the Borel field generated by open sets in the common metric 
topology.) Let D, be the class of all real-valued functions ¢, on B X R" with the 
following properties: for each x” ¢ R", ¢a(-; x‘”) is a probability measure 
(B) on G; and for each A e B, ¢,(A;-) is Borel-measurable on R”. 

The problem which confronts the statistician may now be described. Let 
X,,--+:, X, be independently and identically distributed according to some 
d.f. F about which it is known only that F ¢%, (or even F ¢ ). The statistician 
is to estimate F. Write X“ = (Xi, --- , X,). Having observed X% = 2” 
(a; ,--+ , 2m), the statistician uses the decision function ¢, as follows: a function 
g eG is selected by means of a randomization according to the probability 
measure ¢,(-; 2°”) on G; the function g so selected (which need not be a member 
of $) is then the statistician’s estimate of the unknown F. It is desirable to select 
a procedure ¢, which may be expected to yield a g which will lie close to the 
true F, whatever the latter may be; the term “close” will be made precise in 
succeeding sections. We note that the decision procedure ¢* which for each 2” 
assigns probability one to the “sample d.f.” S, defined by 


S,(x) = (number of 2; S 2)/n 


is a member of D, . 

We now turn (in this and the four succeeding paragraphs) to measure-theo- 
retic considerations which are relevant to this paper. Our point of view is to 
waste as little space as possible on these considerations, since our results hold 
under any measurability assumptions which imply the meaningfulness of cer- 
tain probabilities and integrals involving elements ¢ of D, , and, in fact, our 
results hold even if these are interpreted as inner measures and integrals (which 
will be proper ones when ¢ = ¢2), as we shall now see. 

In Sections 3, 4, and 6 we shall be concerned, for a given n,¢@eD,,r > 0, 
and F ¢§, with the probability that, when the procedure ¢, is used and the X, 
have d.f. F, the selected estimate g of F will satisfy the inequality 


sup| g(x) — F(x)| > r. 


We shall denote this probability by 


(1.1) Pr.g{sup| g(x) — F(x) | > r}. 


It is clear when ¢ = ¢» that this probability is well defined. This probability 
will also be meaningful if G is sufficiently regular; for example, if G consists of 
functions continuous on the right, the supremum in the displayed expression is 
unchanged if it is taken over rational x, and the probability in question is well 
defined. For our considerations it is not even necessary to restrict G in this way; 
we need not concern ourselves with questions of measurability of 


sup| g(x) — F(z) |, 





644 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


since the optimal properties proved for ¢% hold if the supremum is taken only 
over the rationals (this last supremum is never greater than the supremum over 
all x and is equal to the latter when g = S,). Thus, for arbitrary G and ¢, the 
“probability” expression displayed above may be interpreted with the supre- 
mum taken over the rationals (or, alternately, as an inner measure, or as the 
infimum over all positive integers k and sets of real numbers ¢,, --- , & of 


Pr.g{maxicice | g(t) — F(t) | > r}). 


In Sections 4 and 6 expressions such as 


(1.2) / W(r) d, Pr,, {sup | g(x) — F(x) | Ss r 


appear, the integral being taken over the nonnegative reals with W 2 0 and 
nondecreasing. The probability appearing here is to be interpreted as unity 
minus the probability previously displayed in (1.1), but the integral is to be 
interpreted as including a term ylim,..W(r) if y > 0, where 


vy = lim Pr,g/sup | g(x) — F(x) | > r\. 


In Sections 5 and 6 we will encounter such expressions as 


(13) r(F,¢) = Erg | WOW — FO, FO) dF, 


or such an expression with the first two symbols (operations) interchanged. 
Here W(z, t) is defined for x real and 0 S ¢ S 1, is measurable (in the Borel 
sense on R’), is nonnegative, and for each ¢ is even in x and nondecreasing in z 
for x = 0. Er,, is the operation of expectation when the procedure ¢ is used and 
the X; have d.f. F. If ¢ = ¢%, r(F, #) is clearly well defined. For other ¢, any of 
a number of general assumptions on W and G will suffice to make the integral 
meaningful; for example, if W is continuous, F e 5, , and G consists of functions 
continuous on the right, then the integral is determined by the values of g on 
the rationals, and r(F, ¢) is meaningful. Weaker assumptions may be made, 
and, in fact, one could treat r(F, @) as an inner integral (which is a proper in- 
tegral when @ = ¢;) and still obtain the optimum properties for ¢% which are 
derived in this paper. 

Finally, in Sections 3, 4, 5, and 6, the method of proof used involves integra- 
tion of expressions such as (1.1), (1.2), and (1.3) with respect to probability 
measures £;, on F, . These £;,,, will always be measures (B) and, in fact, will be of 
a very simple form. Sometimes the order of integration will be interchanged in 
these sections. If ¢ = $%, the above operations are all easily justified. For other 
@ these operations may be justified, as in the previous three paragraphs, by 
suitable regularity assumptions on G and W;; or, again, the integrals in question 
may be considered as inner integrals. 





ASYMPTOTIC MINIMAX CHARACTER 645 


2. Two Lemmas. In this section we shall state two lemmas (and a corollary 
to the second) which will be used to prove the results of Sections 3 and 4, respec- 
tively. Lemma 1 is due to Anderson [8], while Lemma 2 is derived from results 
of Smirnoff [9]. 

For any set S in R" and any n-vector p, we write S + p = {x|z — peS}. 
Denote m-dimensional Lebesgue measure by u», . The case of Anderson’s result 
which will be of use to us is the following: 

Lemma 1. Let P be a (possibly degenerate)’ normal probability measure on R" 
with means zero, and let T be any convex body in R” which is symmetric about the 
origin. Then P(T) = P(T + p) for all p. 

We shall also use (in Section 5) the trivial fact that the result of Lemma | 
holds for n = 1 when P is a normal probability measure truncated at (— 8, 8) 
for 8 > 0. In Section 7 we shall mention briefly an application of the more gen- 
eral form of Lemma 1 given in [8]. 

Before stating Lemma 2, we shall introduce some notation. Let U denote the 
uniform d.f. (i.e., the d.f. whose density with respect to 4: is unity) on [0, 1], and 
write, for r => 0, 


' ’ ’ :. oo 
G,(r) _ Py { sup | S,(z) - & > r/V n\ ’ 
\Oszsl1 


/ 


Gin(r) = Pv { max | S,(i/(k + 1)) —i#/k +0] r/Vn\ : 


lsisk 


Ge) =1-230 (-pe™"", 


m==1 


CSszs 


H,(r) Py{ sup [S,(z) — 2] S1r/Vn\, 
\ ) 


Ay..(r) = Py} max [S,(i/(k + 1)) — #/(k+))] Ss r/V ni, 


\isisk 
H(r) =1-—¢7" 
Then 
Gi.n(r) = G,(r) and H;,,.(r) = H,(r) 
for all k, n, r. Moreover, 


im Gx.n(r) = G,(r), 


| 
ko 


lim Hi.,.(r) = H,(r), 
k->eo 


(2.3) 


and ({1], [2], [3]) 
lim lim Gi..(r) = Jim G,(r) = G(r), 


ko no no 


lim lim H;.,(r) = lim H,(r) = H(r). 


ko n-2 no 


(2.4) 


3 The fact that the measure need not be n-dimensional necessitates only trivial modifica 
tions of the argument in [8]. 





646 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


We shall now prove the following: 
Lemma 2. There exists a finite positive constant c such that 


2.5) 1 — H,(r) < ce” 
and 
(2.6) 1 — G,(r) < ce” 


hold for all r = 0 and all positive integers n. 

An immediate consequence is 

Corouuary 2. If W(r) is any nondecreasing nonnegative function defined for 
r > 0, then 


(2.7) lim | W(r) dH.(r) = [| Wr) dH) 
and 


2 p@ 


(2.8) lim | W(r)dG,(r) = | Wr) dG(r). 
noo 0 “0 
Indeed, the lim inf of the integral on the left side of (2.7) or (2.8) is always 
> the respective integral on the right side. Now, if ff W(r)re” dr = «, then 
by (2.1), the integrals on the right side of (2.7) and (2.8) are both infinite and 
thus (2.7) and (2.8) hold in this case. If, on the other hand, 


[ W(r)re?" dr < &, 
Jo 


then Corollary 2 follows from (2.4), (2.5), and (2.6), and in this case both sides 
of (2.7) and (2.8) are finite. 

Proor or Lemma 2. Since 1 — G,(r) S 2 (1 — H,(r)), it suffices to prove 
(2.5). We shall deduce (2.5) from the explicit expression for 1 — H,(r) given by 
Smirnoff [9]. Obviously, 1 — H,(r) = 0 for r = /n, while for 0 < r < Vn, 
equation (50) of [9] asserts 


n—1 
(2.9) 1—4H,(r) =(1—r/Vn)"+rVn Dd Q,(3,7), 
j= (rf) +1 


where [x] denotes the greatest integer < x and 


(2.10) Q,(j,7) = (") G—-rVnyi(n-—jtrvnyn™. 


In what follows we may, and do, restrict ourselves to 0 < r < +/n. 
Taking logarithms and differentiating, it is seen that the maximum of 
—\n 22 
(1 — r/-/n)"e” occurs at r = 0; hence, 


(2.11) (: “ +.) ff <1. 





ASYMPTOTIC MINIMAX CHARACTER 


A simple computation yields for all j with ri/n <j <n, 
2 
© heating oon titan 
r G—rJ/n)\(n-—j+r/n) n-jtrvn 
—4r 


(ee a ee 
1-4(8 -j+rvi) 


n? 


<-& = (3 - f+ rv), 


n* \2 


which on integrating gives 


91° . ' 2 8r° n . 2rvV/n : 4r* 
(2.12) Qn(j,7) < Qa(3, 0) exp | -2r < =; ee ) a a 


as well as 


Qn 


2 ‘ —\ 2 4 
2.13) Qa(j,r) < cxQa(j, 1) exp | 2+ 4 (3 -j+ ry") | 
n? \2 3 
for r = 1; here c, denotes a universal finite constant (and similarly, c:, cs, 
C4, Cs in the sequel). 
We divide the sum of (2.9) into two parts: 7 will denote summation over 
those j for which 


(2.14) 


es n | n 
\i-5 walt 


and >.” will denote summation over the remaining values. It follows imme- 
diately from Stirling’s formula that 


Q.(, 0) % con *? 


for j satisfying (2.14). Hence we have from (2.12), 


pie Q.(); r) <- 5 € 


Hence, 


2.16) Va! Qali,2) < 00 





648 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


Let us now deal with the j occurring in }~” , i.e., those for which (2.14) does 
not hold. If 2r4/n/3 < n/8, then the second term in the exponent in (2.13) is S 
—(r’/8) while otherwise r > 34/n/16 and the last term in the exponent in (2.13) 
is < —(4/9)(3/16)’r°. Thus, in both cases we have for r > 1, 


Qasr) < erQn(J, Demme < Qj, e*™. 


Hence we have from (2.9), 

(2.16) rV/n 2 Q,(j, r) < oe n>,” Qn(j, 1) < ce -“ 

(2.11), (2.15), and (2.16) imply (2.5) for 1 < r < +~/n and thus obviously for 
all r. 


3. Asymptotic minimax character of ¢* for a fundamental class of weight 
functions. In this section we shall prove the asymptotic minimax character of 
¢, (as n — ©) in a sense which is fundamental in that the minimax character 
relative to all reasonable weight functions of a certain type will follow (in Sec- 
tion 4) from the results of the present section. We shall now prove the following 
strong property of $3: 

THEOREM 3. For every value r > 0, 


sup Py {sup|S,(x) — F(x) | > r//n} 
(3.1) in : — 
n-o inf sup Pr.4{sup|g(z) — F(x)| > rv/n}\ 





@eDy Fes 


In fact, the probability in the numerator of (3.1) is independent of F for F ¢ &, 
aud is no greater for any F ¢F — &, than for F ¢&, (see [1]); as an immediate 
consequence of Theorem 3, we thus have 

Corotuary 3. The result of Theorem 3 holds if &, is replaced by F in its state- 
ment. 

We also remark that (3.9) and (3.20) below may be used to give an explicit 
bound on the departure of ¢* from minimax character; the integer N of (3.9) 
may be computed explicitly by merely keeping track of the constants which go 
into various error orders in the proof which follows; an explicit estimate of de- 
parture for n S N could be given similarly. With slightly more difficulty such a 
bound could also be computed in the cases treated in Sections 4, 5, and 6. 

In order to prove (3.1), we shall exhibit a sequence {&,,} of a priori probability 
measures on §, such that, letting A, (k a positive integer) denote the set con- 
sisting of the k points 7/(k + 1) (for 1 S 7 S k), we have 


lim Jim inf [ Py, {sup |g(a) — F(a)| >r in/n\ dten 


kow now ofD, “ ae Ay 


= lim lim | Py {sup | S,(a) — F(a)| > r/-V/n} d&, 


kom n>-wo ac AR 


lim lim Py {sup | S,(a) — a| > r/Vn}, 


m n»s ae Ax 





ASYMPTOTIC MINIMAX CHARACTER 649 


where U is the uniform distribution on [0, 1]. Now, the expression under the limit 
operations on the left side of (3.2) is, for each n and k, obviously no greater than 
the denominator of (3.1) for the same n. On the other hand, the right side of 
(3.2) is equal to the (positive) limit as n — © of the numerator of (3.1), by (2.4). 
Hence, (3.2) implies (3.1). 

In order to prove (3.2), we shall for each k limit ourselves to measures fxn 
which assign probability one to distribution functions in $, of the form 


k+1 


(3.3) F.(z) = > p; Vala), pi > 0, > i: = 1, 
i=] 
where U(x) is the uniform probability distribution on the interval [(¢ — 1) / 
(k + 1), i/(k + 1)]. For fixed k and n, it is easily seen that a sufficient statistic 
for the vector {p,;} (and thus, for the family of F,’s of the form (3.3)) is given by 
the vector T{” = {T{P, TID, --- , Thu}, where Ti? is equal to the number of 
components of x” which lie in the interval [(¢ — 1)/(k + 1), 7/(k + 1)]. Hence, 
the validity of (3.2) will be implied by the following stronger result: Let B, be 
the family of vectors r = {p;,1 Si S$ k + 1} satisfying p; 2 0, 2; mn=1; 
T;” has the multinomial distribution arising from n observations on k + 1 
types of objects, according to some z e« B,, i.e., for integers x; = O with 


k+1 | 
> zr 


(3.4) Pref =2,1sisgk+1} = aja s+ Dest 5 


wy: Ue+1-: 


& is the class of all (possibly randomized) vector estimators 
Vn = {Wni aoeue ed > Wnk+1} 


of x {p;} based on 7” (Y, need not take on values in B,); the fa are prob- 
ability measures on B, , which will be chosen so that 


lim inf P,4 sup | >, Wa = p)| > r/ Vi dl 
\ 1 j=1 


ne vnt&n 


= lim | Pe me Is (Ti? /n — pi)| > r/Va détn 


| jan 


= lim Pra{sup > (Tip? /n — 1/(k + 1)) | > iva, 
where V, = {1/(k + 1), --- , 1/(k + 1)} ¢ B, . Taking limits as k > ~ (we 
have seen that this limit exists for the last expression of (3.5)), we see that the 
demonstration of (3.5) will imply that of (3.2). If we prove (3.5) with &, re- 
placed by the class of nonrandomized y, , then (3.5) will a fortiori be true in the 
form stated above. Hence, in what follows, all ¥, will be nonrandomized. 

Some intuitive remarks are in order regarding the choice of &, (and the min 
defining it) in the next paragraph. For simplicity, let us consider the case k = 
We are then faced with a binomial estimation problem. The classical estimator 





650 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


of the parameter p; is asymptotically normal with maximum variance at p; = 
+ (this is V; ; in general, the corresponding phenomenon which concerns us 
occurs at = V;,). In order to obtain our asymptotic Bayes result (3.5), we want 
£,, to approximate a uniform measure on an interval of p; which has the following 
properties: on the one hand, the width e, of this interval, when multiplied 
by +~/n, must tend to infinity with n; on the other hand, the width itself must 
tend to zero. In terms of the parameter »/n(p, — }) and random variable 
(Ti? — n/2)/+/n, we will then be faced, asymptotically, with the problem 
of estimating the mean of a normal distribution (where, asymptotically, all real 
values are possible for the mean, with a uniform a priori distribution over a 
region whose width +/ne, tends to ~) with almost constant variance. The classi- 
cal estimator will then be asymptotically Bayes for our weight function. Since 
a uniform a priori distribution would be slightly less simple to use (in keeping 
track of limits), we use instead one of the form (3.6) below; but the choice of 
the parameter m,, therein is motivated by the remarks above. 

Let m = m,n = (greatest integer < n'*/k*), let € = & nn = m/n, and let 
&., be the probability measure on B, which is given rise to by the probability 
density function 


(3.6) hen(Pis°°* > Pe) = Cre }(2 - X ».) Il ps | 


i=] 


with respect to Lebesgue measure on the k-simplex {0 < ; 2 S1,72%20 
(1 s « Ss k)} and is zero elsewhere. Here 


Cyn = T((m + 1)[k + 1)/(P(m + 1)”. 


Let YS? = Ti? /n. Leté; = p; — 1/(k + 1). The a posteriori density of 5, , --- , i, 
given that Yi? = y; (1 S ¢ S k) (for possible values of the set {y;}) when &.» 
is the a priori probability measure on B, is (the domain being obvious) 


k+1 1 vite rn 
82) fray lmy ew) =[GM (a+ 45) Y, 
i=l c+ 1 
where we have written d4. = 1 — oid; and yxy: = 1 — Doi y% for typo- 
graphical simplicity; here (C;)" = T'({m + 1]{k + 1] + n)/[]i"T(m + 1 + ny). 
Let 4; = 5; — Yi? + 1/(K + 1). Then the a posteriori density of : ,--- , % 
under the same conditions is (the domain again being obvious) 


fin(m, ge »m| Yr, + 7 » Ye) _ [gx.n(m, or »m| Yi, ie » ye)]” 
(3.8) k+1 ? 
= lo; II ( + 2)” ] ; 
tom] 
where my1 = — > Ni - 
We shall now prove that, for each k and each r* with 0 < r* < ~, we have 
for n > N(k, r*) (the latter will be defined below) 








ASYMPTOTIC MINIMAX CHARACTER 651 
/ 
| 


t | + 
E.P? res » (pj; — vi?) < S3} 


(3.9) 


lS! | r 
> E,P* {sup p> (p; — ¥ni) | < Val —-n 
for all r with 0 S r S +* and all y,; (not necessarily positive or summing to 
unity); here P¥ denotes a posteriori probability of x (i.e., of {p;}) when (3.6) 
is the a priori distribution, while EZ, denotes expectation with respect to the 
measure on B, X R*** given by (3.6) and (3.4). Noting that the second in- 
tegral in (3.5) is unity minus the left side of (3.9) and that for each k the left 
side of (3.9) tends to a limit as n — © (this will follow from (3.20) below), we 
see that (3.9) actually implies that the first and second expressions of (3.5) are 
equal for each k. On the other hand, the limiting joint distribution function of 
the set of random variables {+/n[Yi7? — 1/(k + 1)], 1 S i S k} under V, is 
well known to be that whose density is given in (3.20), below, if we set all y; 

1/(k + 1) and let n — @ in the latter; since (3.20), which is the asymptotic a 
posteriori joint density of the (p; — T{?’/n), is continuous in the y;, and since 
the Y;;? tend in probability (according to (3.6) and (3.4)) to 1/(k + 1) asn— ~, 
it follows that the second and third expressions of (3.5) are equal. (This last 
follows also from the continuity in x of lim,..P,{ } in the second expression of 
(3.5) and the fact that lim, ..f:n(J) = 1 for any neighborhood J of V,.) Thus, 
our theorem will be proved if we prove (3.9), and we now turn to this proof. 


In this demonstration our calculations will be performed under the condi- 
tions 


ly—-W/(kK+1)|<1/2h+1) Gdsisk+)), 
(3.10) | n| < n**/4k(k + 1) (l1sisk+1), 
n> k*. 


All orders O (-) will be uniform in the variables not appearing in the arguments. 
By (3.8), 


ca a k+1 . 
(3.11) log gen = log Ci + D (yi + 2 log yi t+ VY + 0 log (1 + *). 

1 1 Yi 
From (3.10), we have 


(3.12) % I 


Yi Qkn3!s = 


bole 


and hence 


2 3 
log (1 +) = ™ an 0, %, 


Yi 2y3 Yi 


with 


(3.13) \a;| < 1, (lsisk+1). 


> aa 





652 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 
Now, writing 
ni 
(yi + €) log (a + *:) =n 
+ 


and remarking that >“i*? ; = 0, that by (3.10) and (3.13) 


aM 4(k + 1)? 1 
X 6; y? < (k +1) 64B(k + 1) 778? 
and that by (3.10), (3.12), and the definition of 


k+l 2 k+l |». 2 k+1 2 
ni ni ni 
— — — 9) mes _— 
€ , (a Qy; 6; “)\ < 2 Dlr = 9/89 


12 3/44 ° 57, 3/8 => 
l Yi < Bn 2kn* n 


ni i 
Hs, m4" 1—=- +64 
ZY; Yi Yi 2yi Yi 


we obtain 


> ni Lai, 30 
(3.14) > (yi + 0 log (1 + ™) = > 3 

. Yi atx wv" 
with | @| < 1. Combining (3.14) and (3.11), we have 


Ni + O(n) 


i 
Yi 


k+1 k+1 
(3.15) log gen = log Ci + X (ys + € logy: — - a 
1 


Next, we note that 
k+1 . 
(3.16) (C)" I] yp = De cestym(Nys + M, +++ , NYegr + Ms Ya, °° * » Yess), 


where py (wy »°** 5 Wear 3 G1, °** » M41) is the (multinomial) probability that 
among N independent, identically distributed random variables taking on the 
value ¢ with probability qi(Q qi = 1, 4: = 0), there will be w; taking on the 
value i(>> w; = N). Using the familiar representation of this probability in 
terms of binomial probabilities, the definition of m, the inequalities (3.10), and 
the estimate for binomial probabilities 


ps? (Np + t\/Npd — p), NA — p) — t/ Np — p); 2, 1 — p) 
(3.17) s 
= [2eNp(1 — pp) "e * "1 + O(N *”)] 
for |t| < C, and |p — 3| < C; < 3 (given in [5], p. 135), we cbtain (with a 
conservative estimate of error) 


k+1 


(3.18) (C,)"* TI y™ = (1 + O(n™*)) Qan)* I yz 
1 


Hence, in the region (3.10) we obtain from (3.15) and (3.18), writing again 4, 
—Di anand yyw = 1—- Din, 


fen(m, +++ ym | Yr, °°» YR) 


1 


(3.19) k+ —1/2 
= (1 + O(n-™*))(2xn)* ‘(II Yi 
1 J 





ASYMPTOTIC MINIMAX CHARACTER 653 


For the corresponding a posteriori joint density of ¥; = +/nq;,¢4 = 1, ---, k, 


in the region (3.10), we thus obtain (writing y4. = - vi) 
k+1 —1/2 1 k+1 5 

(3.20) (1 + O(n-*)) (24)*” (I 7) exp (- 5 7 t) . 
1 “1 Yi 


Except for the first factor, this is a k-dimensional normal distribution centered 
at the origin. Note also that the probability assigned by this density to the com- 
plement of the region | %: | < n™**/4k(k + 1) of (3.10) (for a single 7) is (by 
Chebychev's inequality) < [1 + O(n™™*)]O(k‘n™‘), so that the probability of 
the above inequality on the 7; for all i according to (3.20) (using k < n") is at 
least 1 — O(n™*). Also, the p; or (3.6) have means 1/(k + 1) and variances 
O(m™"k) = O(n~*), while Y{? (given the p;) has mean p; and variance O(n™"), 
whatever the p; may be. Hence, for a single i, the probability (according to (3.6) 
and (3.4)) that | Yi?? — 1/(k + 1)| < 4(k + 1) is 


=P{| pi — 1/(k + 1)| < 2 +1)} X P| YEP — pi] < 2 + 1) |pi} 
>1—k{O(n™*) + O(n")}. 
The probability that | Yi7? — 1/(k + 1)| < 1/2(k + 1) for all 7 is thus 
=>1-— kKO(n™) = 1 — O(n”). 


We conclude, then, that the region of Y{?, 4; (1 S i S k + 1) specified in (3.10) 
(putting Y{?? for y; and 4; for 7;), and hence where (3.20) holds, has probability 
1 — O(n") according to (3.6) and (3.4). 

Now, for fixed r* > 0, let Ni(k, r*) be such that if n > N,(k, r*), then 


8r*n? < n**/4k(k + 1) 


1 — n'*/2; clearly, such a number N,(k, r*) exists. For 0 < r S r*, let T, be 
the region where | >-j.1 y;| S r,i = 1,--- ,& + 1. Note that 7, is contained 
in the region where | y; | S 2r* for all 7. If p is any vector all of whose (k + 1) 
components are S<n’*/8k(k + 1) and if n > N,(k, r*), then T, and T, + p 
both lie entirely in the region of (3.10) (where (3.20) holds), whose probability 
according to (3.6) and (3.4) is 1 — O(n~”*). Write C, and D, for the events in 
brackets on the left and right sides of (3.9), and define L = L(X,, --- , Xn, Wn) 
to be 1 or 0 according to whether or not 


max +n | Yi? — day | < n/*/8k(k + 1). 


and the probability under (3.20) that all | 4;| are <n™°*/16k(k + 1) is = 


From the previous remarks of this paragraph and Lemma | we conclude that 
(3.21) E{L-P2{C,}] = E{L-P2{D,}] — n*"*/3 

for0 < rs r*,n > N(k, r*), and all ¥,, where N(k, r*) is chosen (as it 
clearly may be because n7"* = o(n~“*)) to be enough larger than Ni(k, r*) to 


give the term n~’*/3 in (3.21). On the other hand, if any component of p has 
magnitude >n*/8k(k + 1), then with probability 1 — O(n~“*) according to 





654 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


(3.4) and (3.6), T, + p has a posteriori probability <n~“’/2. Hence, the 
N(k, r*) above may clearly also be chosen so large that 


(3.22) E,{(1 — L)P3{D,}] — 2n7*/3 < 0 


for 0 < r S r*,n > N(k,r*), and all y,. Equation (3.9) follows from (3.21) 
and (3.22), completing the proof of Theorem 3. 

We remark that ¢% will not be minimax in the sense of Theorem 3 for ail r 
and fixed finite n. The first nontrivial case is that of n 3. A tiresome but 
straightforward computation in this case shows that, among the procedures ¢, 
which for a given number c (0 S ¢ S 3) assign probability one to 


(0 ife>, 
le fZ,s 2 < Zs, 


Je(X) 4 
ll-c ifZS2 
l1 if Z; < 2, 


Z3,; 


where the Z; are the ordered X;", the expression Py{sup.| g.(z) — «| S 2z} is 
. ° . *, 
maximized for § < z S } atc = } (ie., by $3), for } S z S 3} by c = z, and 


3;3= 

for} < z S 1 by any c 2 1 — z (for z S 3, all values of c give probability 
zero). Similar remarks apply to the problems considered in the next three sec- 
tions. For example, Fy{sup.| g(x) — x|} in the above example is minimized 
by ¢ = [33 — 3(17)'"]/52 0.397. Similar calculations are more easily made 
in the case studied in Section 4 (where the distribution of the maximum devia- 
tion need not be calculated), and such calculations may be found in the refer- 


ence cited at the end of that section. 


4. Other loss functions which are functions of distance. In this section we 
show that the asymptotic minimax character of ¢% proved in Section 3 may be 
extended to a broad class of weight functions. It turns out that it is unnecessary 
to start anew in order to prove this; the class of weight functions considered in 
Section 3 (see below) is the basic class in the sense that the minimax character 
relative to many other weight functions may be concluded from the results of 
Section 3 and the integrability result given in Corollary 2. It is clear that the 
method of attack used here, i.e., of carrying out the detailed proof of the mini- 
max character for the basic class of weight functions and then extending to other 
weight functions, can be stated as a general theorem to apply to other statistical 
problems; we shall not bother to state this obvious extension in a general setting. 

Throughout this section W will represent any nonnegative function defined 
on the nonnegative reals which is nondecreasing in its argument, not identically 
zero (the case W = 0 is trivial), and which satisfies 


p@ 


(4.1) | Wr)re?” dr < 
/0 


The main result of this section is the following: 





ASYMPTOTIC MINIMAX CHARACTER 655 
THEoREM 4. Under the above assuiiptions on W, 


sup | W(r) d, Pr{sup | S.(z) — F(z) | < r/V/n} 
i laa i te 


7 in sup Wr) d, Pr¢ (sup | g(x) — F(x) | < r/-V/n} 
As in Section 3 (and for the same reason), an immediate corollary is 
Coro.uary 4. The result of Theorem 4 holds if §, is replaced by $ in its state- 
ment. 
Proor or THrorem 4. By a reduction like that of Section 3, it is seen that 
(4.2) will be proved if, for the sequence {&,} of Section 3, we can prove the fol- 
lowing three statements, (4.3), (4.4), and (4.5): 


lim inf | W(r) d,Pr.¢ (max | g(a) — F(a) | < r/V/n} d&n 


n-o oeD, 


(4.3) 
= lim / W(r) d, Py {max | S,(a) — F(a) | < r/V/n} d&en 
no at A, 


for each positive integer k; 


lim / W(r) d,P» {max | S,(a) — F(a) | < r/V/n} déin 


nw ae A, 


(4.4) 
= lim / W(r) d, Py imax | S,(a) — a| < r/V/n} 


now 


for each positive integer k; 


< lim lim | W(r) d, Py (sup | S,(a) —a| <r/V/n} 
kono n+ 


(4.5) 
= lim / Wr) d, Pv ip | Sa) —2z|<r/Vn} < @. 


(This includes, of course, proving the existence of the indicated limits.) 

Firstly, (4.5) is an immediate consequence of (4.1), (2.4), (2.2), the continuity 
of G and of the d.f. lim,..G;,, , and of Corollary 2. 

In order to prove (4.4), we note first that, for fixed k and any F ¢§, , we have 
(similarly to (2.2)) the inequality Pr{maxees,| Sa(a) — F(a)| S r/V/n} = 
G,(r). Hence, by Corollary 2, the integral with respect to r on the left side of 
(4.4) is bounded uniformly in n and F. On the other hand, given any « > 0, 
there exists an integer No such that, for n > No, tn assigns probability at least 
1 — «toa set of F for which the expressions Pp{ } and P,{ } of (4.4) differ 
by less than « for all r (this rests on the continuity in 7, for x in a neighborhood 
of V; , of the normal approximation (for large n) to the joint distribution of the 
random variables »/n(Yi?? — p;), 1 < i S k). Since Pr{ } is continuous in r, 
(4.4) follows. 





656 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


Finally, we must prove (4.3). Consider any fixed k. Write P%(r; x‘”, ) for 
the probability, calculated according to the a posteriori probability distribution 
of + (given that X°” = 2 and when &;, is the a priori probability measure 
on B,) and the probability measure ¢(-; 2°”) on G (where ¢ ¢ D, and perhaps 
= ¢,) of the set of (g, x) in G@ X B, for which maxa.a,|\g(a) — F(a)| < r/-V/n. 
If (4.3) is false, there exists a value e > 0 such that, for every positive N, there 
isann > N andag, ¢ D, for which (the operation EZ, being as defined in Sec- 
tion 3) 


(46) E, | W(r) d, PR; X, o) < BE, | WO) dP; X, 62) — 26. 


It is clear from the preceding paragraphs that there is a real number g > 0 
such that W(q) > 0 and 


(4.7) E, W(r) d, PS(r; X™, on) < € 
“9 
for all n. Write W,(r) = min(W(r), W(q)). Then (4.6) and (4.7) imply 


r 


(4.8) E, | W,(r) d, {P2(r; X™, dn) — PR(r; X™, o2)} < —e. 


Since W,(r) < W(q), the integral on the left side of (4.8) is 2 —W(q). Hence, 
(4.8) implies that, with probability at least «/W(q) (under (3.6) and (3.4)), 
xX will be such that 


(4.9) W.(r) d, {P2(r; X™, bn )— PR(r; X™, oa)} < —e. 


Let e = «/2W(q). The discussion of the previous paragraph shows that we can 
find an R* and M such that, for n > M, the probability (under (3.6) and (3.4)) 
will be >1 — ¢ that X will be such that 


(4.10) P3(R*; X;¢2) > 1 — €. 


Let y, = sup,{P%(r; X°”, 6.) — P&(r; X™, o%)}. We shall show below that 
(4.9) implies 


(4.11) Ya > €. 


Then (4.10) and (4.11) (the latter of which is an event of probability at least 2¢’ 
according to (3.6) and (3.4)) will imply that for each N > M there isann > N 
and a ¢, ¢ D, for which, with probability >e’ according to (3.6) and (3.4), 
xX will be such that 


(4.12) {Pa(r; X™, bn) — Par; X™, ba)} > € 


for some r with 0 < r < R* (here r depends on n, ¢‘”, X°”’). This contradicts 
the fact that, with probability 1 — O(n™*) according to (3.6) and (3.4), the 
region 7’, of the last paragraph of Section 3 was seen to maximize with respect 








ASYMPTOTIC MINIMAX CHARACTER 657 


to p (uniformly in 0 S r S R*), to within an (added) error of O(n °”), the a 
posteriori probability of 7, + p. Thus, it remains only to prove that (4.9) im- 
plies (4.11). For fixed n, ¢, , X“”, abbreviate the bracketed expression in (4.9) 
as B(r) — C(r). Let 


{0 ifr <0. 
(4.13) B*(r) - 


} 


[min (C(r) +yn,1) ifr>O. 


Clearly, B(r) < B*(r). Hence, since W,(r) is nondecreasing in r, we have 
(4.14) | W,(r) dB(r) = / W(r) dB*(r). 


Let a be the infimum of values r for which B*(r) = 1. From (4.9), (4.14), and 
the fact that B*(r) — C(r) is constant for 0 < r < a, we obtain 


e< | Wer) d(C(r) — BO) 
a= 


lA 


(4.15) = I _ Wr) d(C(r) — B*(r)) 


lA 


[ wo ac@ - BO) + WO + War 
0+ 


A 


2W(q)¥n, 
which proves (4.11) and thus completes the proof of Theorem 4. 


5. Integral weight functions. In this section we consider weight functions 
W* arising from integration of a function W in the following manner: 


(5.1) W*(F,g) = { W(V/nlig(x) — F(x)], F(x)) dF (x). 


Here W(y, z), which is defined for y real and 0 S z S 1, is nonnegative and is 
symmetric in y and nondecreasing in y for y = 0; it may be thought of as a 
measure of the contribution to W% arising from a deviation of y+/n of the esti- 
mator g from the true F at an argument z for which F(z) = z. Typical W’s 
which might be of interest are W(y, z) = |y|” or 0 according to whether or not 
as2zsb(herep > Oand0 S a < b S 1), Wy, z) = 0 or 1 according to 
whether |y| S a or |y| > a where a is a suitably chosen constant, W(y, z) = 
y’/2(1 — 2), ete. 

We now turn to considerations of the asymptotic minimax character of 
¢. with respect to a sequence of risk functions r,(F, ¢) = Er,,W%(F, g), where 
¢@ ¢ D, . (The remainder of the present paragraph will be somewhat heuristic in 
order to compare the present problem with those of Sections 3 and 4; the state- 
ment and proof of Theorem 5 begin in the next paragraph.) These considerations 
are much easier than those of the previous two sections, since in obtaining a 





658 A. DVORETZKY, J. KIEFER AND J. WOLFOWIZ1 


Bayes solution with respect to the a priori probability measure &, of Section 3 
it will suffice (as will be seen below) to minimize with respect to ¢, for each fixed x 
(more precisely, for each irrational 2), 


- ( 
(5.2) ren(x, o, te” 


)= / Ey, W(/nlg(x) — F(x, x)), F(x, ©) de n(x; x, tf”); 
By 


here B, is as in Section 3, F(z, x) denotes the distribution function of (3.3) for a 
given value of = (pi, +++ , Pe4i), and for any measurable subset B of B, we set 


. | f(x, m)Px (th) d&en(x) 
(5.3 ta(B, 2, ”) == dheke 


f(x,e)P x {tf} d&en(x) 


/ By 





where &, is given by (3.6) of Section 3, f(z, 7) = dF (a, x) / dx (this derivative 
exists for z irrational), and P, {i} = P.(Ti?? = 4,1 <5 i S k+ 1} is the 
probability function defined in (3.4). (Of course, ¢ in (5.2) may randomize over 
many g, which accounts for the presence of the E, operation.) Thus, present 
considerations will involve only the obtaining of a (univariate) normal approxi- 
mation to the a posteriori distribution (more precisely, to a slight modification 
(5.3) of it) of F(x, x) for fixed irrational x, which is much easier than the multi- 
variate approximation (3.20) which it was necessary to obtain in Section 3. (We 
shall actually use (3.20), which implies easily the needed univariate approxi- 
mation; however, the latter could have been obtained more easily directly.) 
The above remarks will be made precise in what follows. We hereafter denote 
the infimum of r;i»(z, ¢, t:") over all ¢ in D, by rip(z, t{”). The set of reals 


{z|0<z<eorl—e<z<1} 


will be denoted by J, for 0 < ¢ < }. 

We now state Theorem 5. Our statement of this theorem is not the most gen- 
eral possible. (The set J, may be replaced by other sets where W(y, z) is large, 
the continuity conditions on W may be weakened by considering continuous 
approximations to (a more general) measurable W, the integrability condition 
may be weakened, and W may be replaced by a distribution (rather than a 
density) in z so as to obtain results, e.g., on the estimation of F at a finite number 
of quantiles.) Rather, it is stated in a form which allows W to be any of the 
functions which would usually be of interest in applications, e.g., any of those 
functions given at the end of the first paragraph of this section, etc. (It should be 
noted that if the assumptions of Theorem 5 below were altered by deleting (5.5) 
and putting « = 0 in (5.4), then such weight functions as y’/z(1 — z) would be 
excluded. The circumlocution of including the condition (5.5) could be avoided in 
such cases if one could obtain a sufficiently strong bound on 


Po{Vn{S,(2) — 2] > rJ/x0l — 2)} 





ASYMPTOTIC MINIMAX CHARACTER 659 


which is independent of x. The difficulty of obtaining such an approximation is 
discussed in [4], p. 285.) 

THeEoreM 5. Let W(y,z) 2 0 be defined forOS y < ~,0 <z < 1 and assume 
that W(y, z) is monotone nondecreasing in y and (to avoid trivialities) that W(y, z) 
is not almost everywhere zero (in the two-dimensional Lebesgue sense). Suppose 
further that (a) to every z',0 < z’ < 1, not belonging to an exceptional set of linear 
measure zero, and every 6 > O there corresponds ¢(6, z’) > 0 with the property that 
the set of y for which W(y, z) is discontinuous for at least one z satisfying | z — 2’ | < 
(6, 2’) has exterior (linear Lebesgue) measure smaller than 5. Suppose also that 
(b) for each « with 0 < « < } there is a function V(y, «) such that W(y, z) < 
V(y, &) fore <2 <1 — eand0 S y < & and such that 


(5.4) V(y, e)ye™ dy < &. 
“0 


Suppose, finally, that (c) 


(5.5) lim sup | Ey W(VaI{S, (2) — 2], x) dx = 0. 
T¢ 


«+0 n 
Then 


, *\ 
sup ra(F’, oa) 
FeS. 


(5.6) lim ——*——__—_—- = 1 
n>e inf sup r,(F’, o) 
oeD, Fes, 

Proor. r,(F, ¢2) is, of course, independent of F for F in 5, . Because of Corol- 
lary 2 and the assumptions of Theorem 5, the numerator of (5.6) approaches a 
finite positive limit, say L, asn — ©. For any 6 with0 < 6 < L we may choose 

. _ 
¢ so small that r,,.(F,¢,) tends to a limit > L — 6 when n— ~~, where r,,, 
is the risk function corresponding to loss function W,(y, z) defined by 


i : , _ | Wy, 2) f#ecl., 


It clearly suffices to prove (5.6) with r, replaced by r,,.. We hereafter drop the 
subscript «on W, and r,,, and (because of (5.7)) may restate what is to be proved 
as (5.6) under the continuity assumption (a) on W and (replacing (5.4) and 
(5.5)) the assumption that W(y, z) < V(y) for O Sy < ~ andO0 <2z <1 
where 


’ 


~a 


(5.8) | Viy)ye™ dy < o. 
“0 
In what follows we denote (for fixed k, n, irrational x) by P3{A} the prob- 
ability of any event A which is expressed in terms of T{” when the probability 
function of T{” is given by 





. DVORI'TZKY, J. KIEFER AND J. WOLFOWITZ 
}Sk+1} 
a / f(z, m)PA{Ti? = t,1 Si Sk +1} dén(x), 
’ x) By 

where P, is given by (3.4) and d(k, n, x) is the sum over all (4; , --- , te41) of 


the integral on the right side of (5.9). Expectation with respect to the probability 
function (5.9) will be denoted by EF . We now have 


° l 
| rr(F, &) d&n = | i E46 W(Vnle(x) — F(a, x), F(a, r)) dten(x) dx 
- “0 By 


(5.10) 7 
= | Ex rin(2, d, Tt”) d(k, n, x) dx, 


where the last integration (and each integration which follows) is over irrational z. 
Hence, in order to prove (5.6), it suffices to show that (5.8) and our continuity 
assumption on W imply that 


lim lim | E®rt,(x, TE”) d(k, n, x) dx 


> no 0 


1 
= lim Ev W(VnIS,(x) — a], x) dx, 

n»o ~0 
since the right side of (5.11) is the limit of the finite positive numerator of (5.6). 
Let x be an irrational number, 0 < x < 1, which is a nonexceptional z’ of 
our continuity assumption (a). For fixed k with 1/(k + 1) < min (2,1 — 2), 
we may write x (io + )/(K +1) with 1 Sm Sk —1landO Sst <1. 
Write g(r, 0°) = (2x0)? exp (—r’/20”). We shall show that, given any x and 
k as above and any ¢’ > 0, there is an integer N = N(e’, x, k) such that for 
n> WN we have | d(k, n, x) — 1| < é and such that, for n > N, PF? assigns 

probability at least 1 — ¢’ to a set of T{” values for which 


rien (2, TS”) + > Wy, xz)q(y, x1 — x) + h) dy, 


where h = (¢ — t)/(k + 1). But, for fixed irrational and nonexceptional z, 
the right side of (5.12) tends, ask — © (and thus, h — 0), to the limit asn — « 
of the integrand in the right-hand member of (5.11). The integra! of this limit is, 
by (5.8), the same as the right-hand member of (5.11). Thus, using (5.12) and ap- 
plying Fatou’s lemma to the left side of (5.11), we conclude that (5.11) will be 
proved if we demonstrate the statement of the sentence containing (5.12). 

For fixed x and k as above and for any ¢« > 0, &» assigns to the set of m for 
which |f(x, +) — 1| < €a probability which tends to unity asn — ~. It follows 
that d(k, n, z) > l1asn— @ and that (noting the relationship between &,, and 
the fee of Section 3), for any « > 0 and for n sufficiently large, Pp? assigns prob- 





ASYMPTOTIC MINIMAX CHARACTER 661 


ability at least 1 — ¢ to a set of values ¢{” of T{” for which, writing y; = &7?/n, 
the joint density function of the 7; = ~W/n(p; — y,) (1 S i S k) according to 
tra(-, 2, t”) ina spherical region cente red at 0 in the space of the 7; and of 
probability at least 1 — ¢ according to £, is at least 


ain k+1 - at k+1 
(1 — ¢)(2x) II y) exp 5 2 Vi/ys )- 
1 i | 
< 


Now, in the notation of Section 3, for 1 < i s k, 
FO/k+1),7)=m+--- +7. 
Hence, if Ti?’ = ny; (1 S i S k), we have (because of the form of (3.3)) 
Pit ++ + Dig + Pitt = (Yr + +++ + Yin + tins) 
+ (Fi + +++ + Hig + tHiggs)/V 7. 


Now, 7{!’/n tends in probability (according to P?) to 1/(k + 1), and expression 
(5.13) with « = 0 is continuous in the y; (in the region where all y; > 0). More- 
ever, if we had e = 0 in (5.13) and assumed the validity of this expression for all 
“lien of the y; and put all y; = 1/(k + 1), then (1 + --- + 4,) and Fa: 
would, according to (5.13), have a bivariate normal density ccunes with means 
zero and covariance matrix 


(5.14 J or + 1 — i) a) 
m (k + 1)? \—%a k}* 


The corresponding density function of 7; + --- + Fi, + Hi,4.1 would then be 
normal with mean zero and variance 


(5.15 [io(k +1- to) — tio + tk\/(k + 1 = z(1 a x) + h. 


Hence, if we carry through this last argument with the actual form of (5.13) and 
its region of validity, we conclude that, for any e” > Oand for n sufficiently large, 
P* assigns probability at least 1 — e” to a set of values ¢{” of T{” for which, 
on a real interval centered at 0 and of probability at least 1 — «” according to 
t,(-, x, "), this last measure induces a distribution function J for 


Vn[F (x, 7) — (yr + +++ + Yin + tyinst)] = As (say) 


whose absolutely continuous component has a corresponding density (the deriv- 
ative of J) whose magnitude is at least 


(5.16) (1 — e”)q(d, z(1 — x) + A) 


almost everywhere on this interval of \-values. 





662 \. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


Next, we note that 


(5.17) W(/nig(x) — F(z, x)], F(x, r)) = Wip — Az, t + ut Ax/Vn), 


where p = V/nig(x) — (yi tees + Yio + tyig4s)] and 
w= —2+ (yr + -+> + Yin H tyig+s)- 


For fixed x and k as above, denote by a the right side of (5.12). Let 3 be such 
that the right side of (5.12) is at least a — «¢’/4 if the limits of integration are 
changed to (—8, 8). Let c = W(8, x). Let the 6 of our assumption (a) be 


e'/8cq(0, x(1 — x) + A), 


and let 2’ = x where z is nonexceptional. The set 0 S y S 8, \|z — z| S €(6, x) 
minus a suitable countable set of open intervals of total length <6 covering the 
points of discontinuity is closed and bounded. Hence, W is uniformly continuous 
on this set. Hence, there is a value « > 0 such that W(y, z) 2 W(y, xz) - 
é'/4 for |x — z| S « and 0 S y <= B but y not in the excluded set. If0 < y 
S B and y is in the exceptional set, y is in a maximal subinterval of the excep- 
tional set of either the form a < y < b witha > 0 or else of the form 0 < y 
< b. Define W(y, x) = W(a, x) in the former case and W(y, x) = 0 in the 
latter. If 0 < y S 8 but y is not exceptional, define W(y, 1) = W(y, x). If y > B, 
define W(y, zx) W(8, x). Finally, set W(—y, x) = W(y, x). The function W 
so defined is symmetric in y, nondecreasing in y for y = 0, and has the property 
that 


(5.18) Wy, z) = Wy, x) — &/4 for |r — z| S « and all y, 


and also that 


~B 


(5.19) L, Wy, x)q(y, x(1 — x) +h) dy => a — €'/2. 


Now, let N = N(e’, x, k) be such that, forn > N and with e” é'/4(a + 1), 
the conclusion (5.16) holds with the \-interval including the interval (—8, 8), 
and such that |d(k, n, x) — 1| < ¢ forn > N. Write g for the random variable 
defined by putting T;??/n for y; in the definition of u. Since a tends to zero in 
probability (according to P?) asm —> «, we may also suppose N to be such that, 
for n > N, P3{\a| + 8/Vn < a} = 1 — &”. Next, we recall the statement 
made immediately following the statement of Lemma 1, that for n = 1 the 
conclusion of Lemma 1 holds if the normal probability density is replaced by one 
truncated at (—8, 8). We also note that the integral (with respect to A) of this 
truncated density multiplied by W(p — X, z) is easily seen (by an argument like 
that used to deduce (4.11) from (4.9)) to be minimized at p = 0. We note, as in 
previous sections, that if (5.12) is true under the restriction to nonrandomized ¢ 
(in the definition of r*), then (5.12) is a fortiori true without this restriction. Thus, 
from (5.2), (5.16), (5.17), (5.18), and (5.19), we have for n > N that, with P- 
probability at least 





ASYMPTOTIC MINIMAX CHARACTER 663 
(5.20 1— 2” >1- é, 
m(n) ‘ 
T'.”’ will be such that 


* y(n)’ 
TinlZ, T'. 7 


= inf | Wie —d,2 + w+ A/Va(L = "g(a, 2(1 — 2) +h) ar 


p 


B 
(5.21 > (1 — ’) int | [Wp — A, x) — &/4]q(A, c(1 — x) +h) aA 
e —B 


4 
(1 — e”) | [W(—v, 2) —e’ ‘Alq(A, x(1 — x) + h) dd 
8 


> (1 — «’)(a — 3¢'/4) >a — ed. 


This completes the proof of (5.12) and thus of Theorem 5. 

We have not stated a corollary to Theorem 5 of the type given after Theorems 
3 and 4. For F ¢F§ — §,. , a weight function of the form (5.1) seems less meaning- 
ful because the loss contributed at a saltus z of F is measured by W(y, z), where 
z = F(x + 0). There are also certain technical difficulties in that the numerator 
of (5.6) need no longer be the same if §, is replaced by $. We shall not bother 
with the circumlocutions (e.g., additional restrictions on W) necessary to obtain 
a corollary from Theorem 5 in the same trivial manner as such corollaries were 
obtained from Theorems 3 and 4. 

Theorem 5 implies certain much weaker results which, for special forms of W, 
may also be obtained from results obtained by Aggarwal [6]. He considers only 
the class C,, of procedures which with probability one set g(x) = c§” for Z;” < 
a < Zt), where the {Z‘”} are the ordered values of the {X$”}. (Such pro- 
cedures have constant risk for F ¢ §, and W, of the form (5.1).) For the special 
functions W(y, z) = |y|" and W(y, z) = |y\"/z(1 — z) (r a positive integer), he 
obtains the best c‘” explicitly in a few cases and in the other cases characterizes 
them as the solutions of certain equations. In the former cases ¢, may be seen 
to be asymptotically best in C,. This result is an immediate consequence of 
Theorem 5, where the result is proved for the class D,, of all procedures, of which 
the class C,, is a small subclass. 


6. Other loss functions; multinomial estimation problems. The results ob- 
tained in the previous three sections may be extended to a more general class of 
loss functions to which the same methods of proof may be seen to apply. Thus, 
for example, in Sections 3 and 4 we could consider the maximum deviation over 
a set of z values for which F(z) is in a specified subset of the unit interval (this 
will involve techniques like those used in Section 5); the formulation of Theorem 
5 already includes weight functions which may (e.g.) vanish for certain values of 
F(z), and other modifications (e.g., to consider a finite set of points) are men- 
tioned in the paragraph preceding Theorem 5. We may also consider (in Section 
4) loss functions such as W,(r1) + W2(re) where 7; and rz are the maximum devia- 





664 \, DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


tions over two (not necessarily disjoint) sets of the type mentioned above and 
W, and W, are functions of the type considered in Section 4. Linear combina- 
tions of loss functions of this last type and the type considered in Section 5 may 
similarly be treated. In all of the above we may replace sup, |\g(x) — F(x)| by 
supz [|g(x) — F(x)|h(F(x))], where h is any nonnegative function (suitably 
regular), without any difficulty; this includes as a special case maximization over 
a subset as described above. 

Thus, it appears that our results hold for a very general class of weight func- 
tions. It would of course be of interest to subsume all cases under one unified 
criterion and one method of proof. In the portion of Section 7 which is devoted 
to heuristic remarks, such a criterion (symmetry and convexity of a certain func- 
tional) is indicated; unfortunately, it does not include all cases treated above 
(e.g., the result of Section 3, which is apparently somewhat deeper), some of 
which will be seen in Section 7 to be slightly more difficult to handle than the 
symmetric convex functionals. A more general class 2 of monotone functionals 
for which (perhaps under slight regularity conditions) our results would seem 
likely to hold, and which includes the weight functions of Sections 3, 4, and 5 
as well as those of the previous paragraph, is also indicated in Section 7. In the 
present context, this class consists of nonnegative functionals W of the function 
\8| defined by 6(y) = g(F ‘(y)) — y, 0 S y S 1 (where we suppose for sim- 
plicity that the possible c.d.f.’s F under consideration are members of F, which 
are for each F strictly increasing for sup F-'(0) < y S inf F-’(1)) for which 
W (\i:(y)|) S W(\de(y)|) whenever |4,(y)| S |é(y)| for 0 S y S 1. However, at 
this writing it is not evident how to give a rigorous unified proof (as distin- 
guished from the heuristic one of Section 7) even for the class of weight functions 
which are convex symmetric functionals (of 5, in the present context), let alone 
to give one for the class ©. 

Another modification is to consider sup, |\g(z) — F(x)|h(x) above instead of 
sup; |g(z) — F(x)\|h(F(x)). In this case ¢% will not have constant risk over &, . 
However, this case is easily treated as follows: suppose for simplicity that h is 
continuous and bounded (the unbounded case is trivial and may be treated by a 
similar argument). Let J be an interval in which h is entirely within a prescribed 
« > 0 of sup. h(x). We may for simplicity suppose J to be the unit interval. 
Then the risk function of ¢* will attain a value close to its maximum for F U. 
The argument of Sections 3 and 4 may now be applied. In a similar manner we 
may consider in Section 5 loss functions for which the risk function of ¢% is not 
a constant; for example, (5.1) could be replaced by 


(6.1) weir.) = | Walg(2) — F(@)),2) due) 


for a specified function W and measure yu satisfying certain regularity conditions. 

An interesting question is whether or not our results can be extended to yield 
a sequential asymptotic minimax character, e.g., in the sense of Wald [7]. This is 
too large a topic to be discussed thoroughly in this paragraph, but a few indica- 





ASYMPTOTIC MINIMAX CHARACTER 665 


tive comments are in order. An essential idea present in the form of the &, of 
Sections 3, 4, and 5 is that, when k is large, a certain multinomial estimation 
problem is almost as difficult as the problem of estimating F. This suggests that, 
when the weight function considered here is such that the corresponding multi- 
nomial problem has (perhaps only asymptotically) a fixed sample-size minimax 
estimator (among all sequential estimators), then we may conclude that the fixed 
sample-size procedure ¢. is asymptotically minimax among all sequential pro- 
cedures. An examination of [7] shows that such an asymptotic sequential mini- 
max property for the multinomial problem will often be easy to prove using 
methods like Wald’s. 

Finaliy, the methods of this paper (without the limit considerations as k — ~ ) 
may be used to prove certain asymptotic minimax results for the estimation of the 
parameter x of the multinomial distribution (3.4) as n — , for any fixed k. 
To see this, we note that, under fairly general conditions of monotonicity and 
symmetry of the weight function (similar to those of Sections 3 to 5), the limit- 
ing risk function of T{”/n as n — © will be continuous in a neighborhood of the 
point of B, at which its maximum is achieved. Hence, for any « > 0, there will 
exist an interior point Vi of B, in a neighborhood of which the limiting risk 
function of T,”’/n is continuous and at which point the limiting risk of T{”/n is 
within ¢ of its maximum. One can then find a sequence {&,} of a priori distribu- 
tions on B, (similar to the sequence used in Sections 3, 4, and 5) which assigns to 
any neighborhood of V? a probability approaching one as n — © and which 
“shrinks down” on V; at a slow enough rate (see the remarks of the paragraph 
preceding that containing (3.6)) to make the a posteriori probability distribution 
of the »/n(p; — T{!?/n) normal with mean 0 so that T{”/n is asymptotically 
Bayes with respect to {&,}, with integrated risk approaching the limiting risk of 
TS” /n at Vi . The asymptotic minimax character of T{”/n follows. We need not 
detail the wide variety of weight functions for which this optimum asymptotic 
property of the classical multinomial estimator T;"/n follows from the methods 
and the results of the three previous sections as well as of the present section. 
It is perhaps worth while to remark that, although the results in Sections 3, 4, 
and 5 are stated in terms of deviations of sums Yar + Yaz + +++ + Waj of com- 
ponents y,; of the estimator y, from p: + po + --- + p;(1 Sj S k + 1), the 
given proofs apply with only trivial modifications to weight functions depending 
on differences ¥,; — p;. Thus, for example, for any set of numbers c; > 0, the 
asymptotic minimax character of T\” /n for estimating « ¢ B, for the risk func- 
tion 


(6.2 r,(7, Vn) =l]-— P| \WWnj be ag p5\ s e;/Vn, (I > J 2 k + 1)} 


follows from the asymptotic normality of the a posteriori distribution, noted 
above, and from the convexity and symmetry about y, of the set of x (in R**’, 
not B,) satisfying the inequalities in brackets in (6.2). (It is clear from this exam- 
ple that Vi need not be the V; of Section 3.) The result for other risk functions 
follows similarly, using the methods and results of Sections 4, 5, and 6. 





666 \. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


These asymptotic results for the multinomial estimation problem do not seem 
to have appeared previously in the literature. As indicated two paragraphs above, 
some of these multinomial results may also be extended to sequential problems. 


7. Convex functionals and monotone functionals of stochastic processes; 
heuristic considerations. The first part of this section will be devoted to some 
simple remarks concerning convex symmetric functionals of random elements; 
these remarks will then be applied to give a short heuristic argument for many 
of the results obtained in previous sections. 

Let B = {b} bea linear space (or system) and ¢ a random element with range 
in B and having a symmetric distribution, i.e., such that whenever A is a meas- 
urable subset of B so is —A and P{ft <«A} = P{g¢ « —A}. Let w be a measurable 
real-valued convex functional on B which is symmetric (w(b) = w(—b) for 
b e B) and convex (w(Ab; + (1 — A)bo) S Aw(b:) + (1 — A)w(be) forO < A < 1 
and b; , be ¢ B). We now note that, since min w(b) = w(0) > — ~~ so that the 
expected value Ew(¢) is always defined, we may conclude that 


(7.1) Eat) = min Ea(¢ + b) 
b 


eB 
from the equation (implied by symmetry of P) 

(7.2) Ew(S + b) = Ew(—f + b) = ZF {ol + b) + w(—F + b)} 
and the equation (implied by symmetry and convexity of w) 

(7.3) w(f + b) + wo(—F + b) = wf + b) + wf — b) = (S). 


We shall now apply (7.1) to the “‘tied-down” Wiener process (see [2]). B is 
now the space of continuous functions b(¢) on the unit interval 0 < ¢ S 1. The 
probability measure P assigns probability one to the subset Bo of elements b of 
B satisfying b(0) = b(1) = 0. The measurable sets are generated by all sets of 
the form {b\b(t) < ao} for 0 S tf S 1 and ap real. The joint distribution of 
t(4), --- , ¢(,) for any OS 4 S --- St, S 1 is normal with Efg(s) = 0 and 
E{¢(s)¢()} = min (s, t) — st for 0 S s,¢ S 1. Note that the distribution of ¢ 
is symmetric. 

Let W be any symmetric real-valued convex function on R’. Then 


W (max, \b(t)|) 
is a convex functional of b and (7.1) implies that 
(7.4) EW (max, |¢(t)|) S EW(max, |¢(t) + p(t))) 


for all continuous functions p. Generalizations of this result to the case where 
max; is replaced by max,h(¢) in the manner of Section 6, or where p is allowed to 
be of a more general class than the continuous functions, are easily achieved by 
adjoining additional functions to B. One may also note that, for every r > 0, 


(7.5) P{max, |¢(t)| > r} < P{max, |¢(t) + p(t)| > r} 





ASYMPTOTIC MINIMAX CHARACTER 667 


for all continuous (or more general, as noted above) functions p. However, this 
cannot be proved in the same manner as (7.4), since the characteristic function 
of the subset of B for which max, |b(t)| > r is not a convex functional on B. 
The validity of (7.5) follows, however, from (2.4) and Lemma 1. This strong 
result of a domination of an entire distribution function in the sense of (7.5) is 
deeper than the result (7.4); for (7.5) requires (in the proof of Lemma 1 in [8]) not 
merely the symmetry of the probability distribution, but also the convexity for 
every u > 0 of the set where the joint density of ¢(4)), --- , ¢(4.) (or any k 
and t,, --- ,t&) is 2u. (Note, for example, that it is not necessarily true for a 
symmetrically distributed real-valued random variable X that P{|X| > r} < 
P{|X + p| > r} for all real p.) Similarly, the result (7.4) for real functions W (z) 
on the nonnegative reals which are nondecreasing in z for z = 0 (but not neces- 
sarily convex) is a consequence of (7.5) but cannot be proved directly in the 
manner of (7.4) for convex W. Thus, to summarize, (7.4) for convex W follows 
from the symmetry of the probability measure, while in proving (7.4) for non- 
decreasing W (and, in particular, (7.5)) we use the additional assumption on the 
probability measure which is used to prove Lemma 1. We note that it has not 
been necessary to assume any integrability condition here. 

It is interesting to note that, for the special case of a linear function p(t) 
c + dt, the right side of (7.5) is given by formula (4.3) of [2] witha = r — c — 
d,b r—c,a r+c+d,B8=r-+c(unlessa20,b6>0,a20,8>0, 
the probability in question is unity; our ¢(t) is Doob’s X(é)). It does not seem 
completely apparent from the form of (4.3) of [2] that this expression, with the 
above substitutions, is a minimum for c = d = 0. The same is true of expecta- 
tions with respect to the d.f. (4.3) of [2] of functions W of the type considered 
above. 

Next, let the real-valued function W(y, z) be symmetric and convex in y for 
each z (—« < y < ©, 0S z S 1) and satisfy obvious measurability condi- 
tions. Let u be any measure on the unit interval. Then 


~1 

(7.6) w(b) = W (b(t), t) du(t) 

“0 
is a convex functional on B and hence (7.1) holds. In this case the result for the 
case where W(y, z) is symmetric in y and nondecreasing in y for y 2 0 (but not 
necessarily convex for each z) is not much more difficult, although it cannot be 
handled by using (7.1): we need only apply Lemma 1 for n = 1 for each fixed z 
in this case in order to obtain the desired result. 

More general convex functionals (such as combinations of the two varieties as 
treated in Section 6) or nonconvex functionals with certain monotonicity proper- 
ties may be handled, similarly, by using (7.1) or consequences of Lemma 1 similar 
to (7.5), respectively. It is possible that the conclusion (7.1) holds for the class 
Q of (not necessarily convex) functionals w which are nonnegative, for which 
w(t) = w(|¢}), and for which w(|f:|) S (|f2|) whenever |f1(é)| S |f2(é)| for all ¢. 


Similarly, results on processes other than the tied-down Wiener process, whose 





668 A. DVORETZKY, J. KIEFER AND J. WOLFOWITZ 


distributions are symmetrical or also satisfy the property which (as mentioned 
above) is used in [8] in proving the more general form of Lemma 1, may be 
obtained by using (7.1) or the generalization of Lemma 1 in [8], respectively; 
We now turn to a heuristic argument for the results obtained in previous sec- 
tions (except for certain results of Section 6, as noted below). This discussion 
may also be thought of as an outline of one intuitive explanation of why these 
results hold, the epsilontics and use of Bayes solutions in the previous sections 
supplying the needed rigor. However, the discussion which follows does not use 
Bayes solutions, and it would certainly be worth while to obtain an independent 
argument which would show that in the limit one need only consider “limiting” 
decision procedures of the type considered below, and thus to conclude that the 
argument which follows can be made rigorous by means of only brief additions. 
In the previous sections we were concerned with estimating (for various weight 
functions) an unknown element F of ¥, . Denote by gn(z; X‘"’) such an estimator 
of F(x) based on X‘” (for notational simplicity, we have considered a non- 
randomized ¢, ¢ D,). Suppose we could show that for our considerations, at least 
asymptotically, it is only necessary to consider functions g, of the form 


gn(x; X) = y,n(S,(z)), 


i.e., procedures in the class C, mentioned in the last paragraph of Section 5 
(this is one of two crucial gaps in our heuristic argument, for it is not obvious 
how to give a short proof, which in no way depends on the results of Sections 3 
to 5, of this supposition). Procedures in C, will have constant risk for F e ¥, and 
for any of the weight functions of Sections 3, 4, and 5 (and those of Section 6 
for which ¢% was not remarked to have constant risk). Thus, we may consider 
the distribution of the random function y,(S,(t)) — ¢t for 0 S ¢ S 1 where 
F = U. Write pa(z) = 2 + pn(z)/~/n. Then 


Vnlbn(Sr(t)) — 1 = Sn{S.() — t) + pa(S, (0). 


If we could now suppose (and this is the other crucial nonrigorous development) 
that there is a sequence {y,} of minimax procedures (for n = 1, 2, ---) such 
that the corresponding sequence {p,(z)} has a continuous limit p(z) uniformly in 
zasn-— > , and note that p,(S,(¢)) would then be bounded (for n sufficiently 
large) and would tend to p(t) with probability one as n — ©, then by [2] and [3] 
our consideration of +~/nly,.(S,(t)) — t] would be reduced, asymptotically, to 
that of ¢(t) + p(t). The earlier comments of this section would then yield the 
desired asymptotic minimax properties of $% . 


REFERENCES 
(1) A. N. Kotmocorov, ‘Sulla determinazione empirica di una legge di distribuzione,”’ 
Inst. Ital. Atti. Giorn., Vol. 4 (1933), pp. 83-91. 
(2) J. L. Doos, ‘Heuristic approach to the Kolmogorov-Smirnov theorems,’’ Ann. Math. 
Stat., Vol. 20 (1949), pp. 393-403. 
[3] M. Donsker, ‘‘Justification and extension of Doob’s heuristic approach to the Kol 
mogorov-Smirnov theorems,’”’ Ann. Math. Stat., Vol. 23 (1952), pp. 277-281 








ASYMPTOTIC MINIMAX CHARACTER 669 


[4] Paut Lévy, Théorie de L’Addition des Variables Aléatoires, Gauthier-Villars, Paris 
(1937). 

[5] J. V. Uspensxy, Introduction to Mathematical Probability, McGraw-Hill Book Co., New 
York (1937). 

[6] O. P. AGG@arwat, “Some minimax invariant procedures for estimating a c.d.f.,’’ Ann. 
Math. Stat., Vol. 26 (1955), pp. 450-463 

[7] A. Wap, “‘Asymptotic minimax solutions of sequential point estimation problems,”’ 
Proceedings of the Second Berkeley Symposium on Mathematical Statistics and 
Probability, University of California Press (1951), pp. 1-11. 

[8] T. W. AnpErson, ‘‘The integral of a symmetric unimodal function,’’ Proc. Amer. Math. 
Soc., Vol. 6 (1955), pp. 170-176. 

[9] N. V. Smrrnorr, “Approach of empiric distribution functions,’’ Uspyekhi Matem. 
Vauk., Vol. 10 (1944), pp. 179-206 





A COMPARISON OF TESTS ON THE MEAN OF A LOGARITHMICO- 
NORMAL DISTRIBUTION WITH KNOWN VARIANCE! 


By Norman C. SEvero® anp Epwin G. OLps 
Carnegie Institute of Technology 


1. Summary. Three test procedures are considered for testing an hypothesis 
on the mean of a logarithmico-normal distribution with known variance. The 
first is a normal theory test applied to the logarithms of the original data; the 
second is a normal theory test applied to the original data; and the third is a 
test based on the Neyman-Pearson Lemma. 

The operating characteristics of these tests are developed and some asymp- 
totic properties obtained. It is found that the three procedures give quite dif- 
ferent results unless the mean under the null hypothesis is large relative to the 
standard deviation. 


2. Introduction. The studies of the correct transformation to be applied to 
data in order to more closely fulfill the assumptions underlying a statistical test 
occupy an important place in the statistical literature. In particular, the use of 
the logarithmic transformation is widely advocated in cases where the error 
distribution is known to be logarithmico-normal; or where component effects 
in the analysis of variance are multiplicative; or where variance heterogeneity is 
such that the variance is proportional to the square of the mean. The logarithmic 
transformation would then make the error distribution normal; or cause the 
effects to be additive; or homogenize the error variance. Thus, a transformation 
is effected in order to force an observed, and slightly unconventional, model 
into a well-known and rather well understood model. 

The present investigation is concerned with the application of the logarithmic 
transformation to the problem of testing an hypothesis on the mean of a logarith- 
mico-normal variate with known variance. An experimenter can fail to recog- 
nize the need for a transformation and simply proceed to apply normal theory 
tests to the original data, or he can properly transform the data and then apply 
a normal theory test to a parameter of the transformed scale. Each of these 
testing procedures is investigated in detail. 

Finally, a third test procedure is developed by using the Neyman-Pearson 
Lemma for testing simple hypotheses. 


A comparison of these tests is then made by means of their operating charac- 


Received June 6, 1955. 

! Prepared in connection with research sponsored by the Aeronautical Research Labora 
tory, Wright Air Development Center. 

2 Summary of thesis submitted by the first author in partial fulfillment of the require 
ments for the degree of Doctor of Philosophy at the Carnegie Institute of Technology. 

3 Fulbright and Swedish Government Fellow at the University of Stockholm, September, 
1955-July, 1956. 


670 





LOGARITHMICO-NORMAL DISTRIBUTION 671 


teristics and some asymptotic properties obtained. It is found that the three 
procedures give quite different results unless the mean under the null hypothesis 
is large relative to the standard deviation. 


3. Statement of the problem. Let y be a normal variate with probability 
density 


‘ . l ee 
(3.1) q(; My, Fy) —~°'. M 
‘ My ; v oy 2e 

Then z, defined through » In x, is a logarithmico-normal variate with prob- 
ability density 
‘iat a . ] Ins—p,)2/202 | 
(3.2) J(X; My, Oy) = Tax. , i. 

TV or x 


If the mean and variance of z are designated by yz, and a; , respectively, then the 
following relationships hold [1]: 


fe FF wet 


(3.3) 


5 


a, = € '**v(e’v — 1), 
‘ : 99 2 ° 

Solving (3.3) for yw, and «) gives 

us 


1/6 2? 
Vo, t+ x: 


: a; 
v, = inf + “|. 
Mz 


If it is assumed that a, is known, the problem is how to test the null hypothesis 
Ho: pz om: , against the simple alternative H; :u, = we > om: , ata significance 


My = |h 


level of a, using a sample O, : 21,22, -++:,2,, Where the 2; are statistically 
independent. 
There is no loss in generality in taking ¢; = 1, since it is always possible to 


make a change of variables by dividing the variable, x, by the known standard 
deviation. Thus, equations (3.4) may be written as 


Nella 
V1 + ui? 


¢, = inf + 4| 
Mz 


which means (3.1) and (3.2) may be written as 





ww = In 


g(y; uy » %y) = 9(Y3 ue), 
f(x; my, oy) = F(x; me). 


4. Normal theory test applied to 7 = In x. The first test procedure is sug- 
gested by the fact that 7 = Inz is a normal variate. In fact, under Ho, y is 


(3.6 








672 NORMAN C. SEVERO AND EDWIN G. OLDS 


N (omy , o,)," where ou, and 0, represent the values of (3.5) at yu, ome - Further 
more, since 14, > ou, , Where wy, is the value of uw, at wu, = wu, it is possible to 
test Ho: wu, = ou, against Hj: yu, ity > oy by using the test statistic 
[7 — omyl0/n/oc, with a critical region specified by 


7 In 2; 
7 


(4.1) 4 
; (ts, °° 5 he) 
\ Fy 


where 7 > y;/n and Zz, is such that 


a re 


“Za 2Qr 


(In short, the test can be characterized by 7, .) Thus, the testing procedure may 
be performed by a normal theory test on the transformed scale. 
Under an alternative u, > ou: the distribution of g = }= In 2;/n is 
N (uy , oy/n), 
so that the operating characteristic becomes 
‘ 


0Fy 
tT OMy? 


Vn 


Br, = 
(4.3) 3 
ee 
\ oy Vn 


By using equations (3.5), one may write this as 


P 


° 


/ _ J Ms au 
| Za V In € 27 In /\ 4 ae In - 7, - = 
(44) Br, = &4— oz Vite Vit ws 


) - —— 


| /in(s +4 
\ in(1 +33) 


which depends upon gu: , uz, a, and n. 


The operating characteristics of the 7 test were computed for the following 
four cases: 


05 
05 
05 
05 


The computed values are tabulated in Table I and the corresponding curves are 
given in Fig. 1, where the following notation is used 


(4.5) 6 = Mz —~ OMz.- 


4 The notation N(y, o”) is used to denote a normal distribution with mean uw and vari 


ance o? 





LOGARITHMICO-NORMAL DISTRIBUTION 673 
TABLE I 
Probabilities that the 7',-test with sample size n will acept uz = ouz when 


the mean is at mu: + 6 






n= 4 | nm ='25 
6/tuer - a hoe a _ayeune 
1 10 i 10 
0.0 95 95 95 95 
0.2 .88 89 .53 .74 
0.4 .74 81 .06 oF 
0.6 .55 .68 .00 .10 
0.8 .33 .54 .00 01 
1.0 .16 39 
1.2 05 26 
1.4 01 .16 
1.6 00 09 
1.8 .00 03 
2.0 00 01 
1.00 
.80 
$ 
5 .60 
2 
e 
3 
> 40 
2 ; 
0 DS x= 4 
° 4 8 12 6 2.0 2.4 $ 


Standard deviations from of 
Fic. 1. Operating Characteristics for the T; Test 


Obviously, the power is not invariant under a translation in ou, . For fixed n, 
the power in discerning a shift of K units (measured in standard deviations) from 
the null hypothesis decreases as the null hypothesis increases; i.e., the test is a 
more powerful one when the null hypothesis is small than when it is large. (Since 
the known variance was assumed to be unity, it might be helpful to rephrase this 





674 NORMAN C. SEVERO AND EDWIN G. OLDS 


to read: The region 7, for testing the mean of a logarithmico-normal variate 
becomes more powerful against alternatives of the mean greater than the hy- 
pothesized one as the ratio of the hypothesized mean to the known standard 
deviation decreases.) 

Further details on the properties of the 7)-test for large qm. are given in Sec- 
tion 7. 

5. Normal theory test applied to x. The second procedure to be studied is 
one which might be applied by the experimenter who, because of either blissful 
ignorance or wishful thinking, assumes the universe sampled close enough to a 
normal universe to justify a test based on normal theory. 

Erroneously considering the logarithmico-normal variate x as though it were 
actually N(u,, 1) leads to the critical region 


(5.1) T's = 4 (xy a tt « In) 


where & - _ x;/n and z, is defined in (4.2). 

The calculation of the operating characteristic at any alternative u, > qu: 
for the 7'°-test is an easy matter for the case when n = 1. If the mean of z is 
Uz > oz, then y = Inz is N(u,, 0,), where pw, = In [u2/+/1 + pu?) and o; 

In [1 + 1/2], and so 


Br, = P{z S 22 + oz} = P{lnz S In [ea + mid} 


/ . 
_ piinz =m < in fee + om) — wy 


Oy Dy 


\ 
= @ in bee + mel — yl 


Gy 
By using equation (3.5) this becomes 


Mz 
In [22 + oz] — In 7 am 
(5.2) Br. = @ <— so Vi = _Ms\ 


4 in (1 + *) 
H; 
which depends only upon ou: , uz, and a. 

The operating characteristic of test 7 is more difficult to obtain for the case 
when n > 1, because the convolution of n logarithmico-normal variates is needed. 
Since this could not be obtained in closed form, the particular procedure adopted 
was to obtain an Edgeworth form of the Gram-Charlier Type A series expansion 
and then to consider a sufficient number of terms to calculate power correctly to 
two decimals. 

The Edgeworth expansion for the distribution of the variate 


x = &— EG.) 


Cr 


in 





LOGARITHMICO-NORMAL DISTRIBUTION 675 


where &, = 2: + --- + 2,, with z,’s independent, and where E(é,) and o;, 
denote the mean and standard deviation of £, , respectively, is given in Cramér 
[2, p. 229] as 


@y (4) 7 y (6) ¢ 
_— : 1» &°() ) 2 &® (2 3/2 
(5.4) F(X) = @(X) - a7 ee + oe OX) 4 ett +O, 
3. es +. 


n 6! n 








; / . . . 
where 7; and y2 are the coefficients of skewness and kurtosis of the x; variate. 
For the logarithmico-normal variate, the skewness and kurtosis become 


vi = (FT — 1)*(r + 2) 


wt 


v2. = (T — 1)(I" + 3r° + 6F + 6), 


where 


Tr = e) = (1 + 1/u2). 


If these results are used, the operating characteristic for 7. when n > 1 at 
some yu, > oz becomes 


. \ 
Br, = P<#é S Ze = + Oz / 
wwe 7 a/n ¥ 
(56 pit — Mr - 2aV in +N ops — Nz\ 
0. me) en eng 
\1/Y n V/n 
= P{X sz. — iV/n}. 





Therefore, for n > 1, the operating characteristic for 7: at a mean pws > ous May 
be written as 


(5.7) Br, F(z. — 6V/n) 


where F(X) is given by (5.4) with the coefficients determined by (5.5). 

The operating characteristics for the same tests studied in Section 4 have been 
computed by using the above expansions. The calculated values are given in 
Table II and the corresponding graphs in Fig. 2. 

Now, for n = 4, the T,-test where ou, is equal to 10 standard deviations is 
more powerful for distinguishing departures less than 1.2 standard deviations 
than is the 7.-test where ou, is equal to 1 standard deviation. For departures 
greater than 1.2 standard deviations, the converse is true. Note also that the 
T.-test for oz, = 10 has an actual a level almost identical with the one for which 
the test was supposedly constructed; i.e., a = 0.05. For mu. = 1, however, the 
true @ level is around 0.039 instead of 0.05, as the experimenter had believed. 
Similar results are true for the case n = 25. Thus the 7: procedure would appear 
to give a rather satisfactory a level when the value specified by the null hypothe- 
sis is large, (or, in general, when the ratio of the value specified by the null hy- 
pothesis to the known standard deviation is large.) When the value specified by 





NORMAN C. SEVERO AND EDWIN G. OLDS 


TABLE II 
Probabilities that the 7'2-test with sample size n will accept uz = qs when 
the mean is at gus + 6 


" ; 


° 
E 
» 
a. 
v 
° 
o 
< 
te 
° 
ad 
» 
“4 
a 
i 
2 
3 
z 


Standard deviations from fk 


Fig. 2. Operating Characteristics for the T. Test 


the null hypothesis is small, the experimenter is actually running a smaller risk of 
rejecting the null hypothesis when true than the risk for which he had constructed 
the test. 


More details on the asymptotic properties of the T)-test are given in Section 7. 





LOGARITHMICO-NORMAL DISTRIBUTION 677 


6. Test based on Neyman-Pearson theory. The third test considered is dic- 
tated by the Neyman-Pearson Lemma (see [2]) for testing a simple statistical 
hypothesis. The test may be characterized by the critical region 


( tas 
(6.1) T: = ((%,--- peer) 


» Xa) = k> 


\ If: 5 Oz) ) 


? 


where k is such that 


(6.2) [ a [Tse ° atte) dts = a. 
“T3 


The inequality in the expression for T; can be shown to reduce to 


> (in a; —b/ayYy sk’ 


@ = 3) — |, 
0Ty 


2 


b= (1%) 

>= — |] Cy — ys 
0Ty 

and where 


‘ 2 < 2 00 : 2 10: , 2|1 nb* 
k! = E i, Ink — 2 yo, In (#*) +n wy, — n (%) i -+ —. 
10y y/o la a’ 


The value of k’ must now be found such that (6.2) is satisfied. Under H, , 


~ 


where 


(6.3) 


Z In « — (b/a) is N(ou, — b/a, oo;). Now the variate 
(6.4) x" = aE 
a 


where z; are independent and N(a;, o°) with a; not all zero, has a noncentral 
x’ distribution with probability density 


—}x’2 tr 2\4n+j—1 
sll pd eg = (x’2)*" in 
(6.9) P\x ) 7 <7 9221 i. 2. 2 , 
Qe jo 2°97! (Sn + 7) 
where \ > a;/o. Hence the value k’ is such that 


k’ /oe2 


v 9 9 
(6.6) p(x’) dy” = a, 


“0 
where the parameter \ in the function p(x”’) is given by 
n(ouy — b/a)"/oo, . 


If the value of k’/oo, which satisfies (6.6) is denoted by Xoa , then the critical 
region (6.1) may be written as 
‘ ‘ 
a {In 2; — b/al "9 
(6.7) 1 4 (xy a In) a ~* X0.a . 


09, 


eae ons 





678 NORMAN C. SEVERO AND EDWIN G. OLDS 


It is interesting to note that this test, although most powerful for testing Ho 
against the simple alternative H,, is not uniformly most powerful against any 
class of alternatives, since the distribution of the test statistic involves the quan- 
tity b/a which depends not only on ou, but also on yy, . 

Extensive tables of noncentral x’ are not yet available, so that it often becomes 
necessary to use approximations which are discussed, for example, by Patnaik 
[3] and Abdel-Aty [4]. (Pearson and Hartley [5] promise to include more exten- 
sive tables of this nature in their second volume to be published soon.) 

When the mean of x is some yz > ouz , then the variate In x — (b/a) is 


N(uy — (b/a), o), 
where yu, and o, are determined from equation (3.5). Hence, according to (6.4) 
and (6.5), the quantity 
an 2 “ (In 2; — b/a)’ 
1 Cy 


follows a noncentral x°-distribution with parameter \ n(uy — (b/a))*/o, and 
degrees of freedom equal to n. Using this fact, the operating characteristic of 
T; for some pz > ous becomes 


ee 3 
Br, = I Xo <= Xa} 


— b/a)’ *. ee 


(6.9) — P = Dy (In — ! x0, 


\10y 0oy dy 


> cxo,a}; 


where 
(6.10) 


and 


(44 — b/a)’ 


2 
Cy 


A = 


The operating characteristics for the following cases were computed: 





The calculated values are given in Table III and the corresponding curves in 
Fig. 3. 





ee ek 


LOGARITHMICO-NORMAL DISTRIBUTION 679 


TABLE III 


Probabilities that the 7'j-test with sample size n designed totest Ho : uz = gz against Mi: 
Mz = we Will accept Ho when us = quz + 6 











n=4 | m= 25 

6/ipe uz | ous: = 10 | we: = 1 
2 | 10 | li 2 

0.0 95 95 95 95 
0. .79 
0.2 85 .90 49 
0.3 14 
0.4 64 .67 .80 .02 
0.6 39 .67 .00 
0.8 17 .52 .00 
1.0 .06 .08 36 
1.2 .0O1 19 
1.4 00 .00 12 
1 


6 .00 .05 


Probability of Acceptance 





Standard deviations from of 
Fic. 3. Operating Characteristics for the 7’; Test 
7. Asymptotic properties of the tests for large values of the null hypothesis. 
If, instead of being a logarithmico-normal variate, x were actually N(u,, 1) 


then the most powerful test of Ho against 17, would be characterized by the crit- 
ical region 





NORMAN C. SEVERO AND EDWIN G. OLDS 


rt Ty 
Ae 
CNET 
aN 

Se 


Probability of Acceptance 


Standard deviations from x 


‘1G. 4. Comparison of the Operating Characteristics for the 7’, 7: , , I’; Tests for 
n = 4, me = 1 


a 


1//n 


and the corresponding operating characteristic would be 
(7.2) Br = ®(z, — 5vV/n). 


The operating characteristics for the 7, JT, , T: , and 7; tests for n 4 and 
oz = 1 are plotted together in Fig. 4, where the scale measures the number of 
standard deviations away from the hypothesized mean. Similar curves are plot- 
ted together in Fig. 5 for the case n = 4 and ou, = 10. An examination of these 
curves indicates that the power depends not only on the specific test being used, 
but also on the specific value of the null hypothesis. In fact, the 7; , T: , and 7; 
operating characteristics given in Fig. 5 cluster closer about the 7 operating 
characteristic than do those in Fig. 4. This suggests that, possibly, the approach 
of all three operating characteristics, as the hypothesized mean is increased, is to 
the operating characteristic of the 7T'-test. Specific results of this nature will now 
be proven. 

Throughout this entire section ou, will be written simply as yu. Furthermore, an 
alternative uz > ous will be written as 


(7.3) Be = pt 4, 


(7.1) T = { (21, +++, Zn) | 








LOGARITHMICO-NORMAL DISTRIBUTION 681 








1.2 1.6 


Standard deviations from oy 


Fic. 5. Comparison of the Operating Characteristics for the 7, 7; , 72, 7; Tests for 
n= 4, yw. = 10 


where 6 represents the number of standard deviations from the hypothesized 
mean. 


Thus the null hypothesis Ho : us = ous and the alternative H; : uz = we > ome 
become 


H,:8=0 
(7.4) 
H,:i=&, 


where 6; Wie ~~ Us. 

The T,-Test. The behavior of the 7; test for large uw is summarized in the 
following theorem: 

THEOREM I. lim,-. 87, = Br. 

Proor. Using the notation (7.3), the operating characteristic of the 7)-test 
may be written as 


2 


esa eo oun -{ SMe ic ge hiuig ‘ 
4. ot V in( Di *) In JVI + (u - 5)? In V1 + ie /n 


. oe 
oni edge) 





~J 
qr 








682 NORMAN C. SEVERO AND EDWIN G. OLDS 


Since #(z) is a continuous function of z, it is valid to take the limit sign inside 
the function. 


By using the expansion of the function In(1 + 1/z) in the neighborhood of 


z = o, it is seen that as uw approaches infinity 


l einantiprentegenmamtiene 
A -~ V1 + O(u) 
(7.6) 50. 
“13 
V1 + Ou + 8) 
ae ¥ + O(u + 4) 


Note also that 


a: 6 : G+?  , ue 
Ao In E +2] 4 2 In Gt ain; 


a 
2 


Be 


a 
3s 1 ; 1 
nfr+2]—yinfr+—4to]+ pinfi +4 


Sh Ou”), 
mM 


so that 


+ O(u) 
(7.7) - renner “+ é. 
—— V1 + O( 5)? 
aco + O(u + 4) 
Hence, if one uses statements (7.6) and (7.7), the limiting operating characteris- 
tic of T, becomes 


lim Br, = &(z — 5vV/n), 


po 


lim Br, = Br. 
rw 

Thus, for large values of the hypothesized mean, the 7'))-test. behaves like the 
T-test. 

The T:-test. A similar theorem to that proved above will be shown for the T>- 
test. The proof involves interchanging the limit and integral signs. As the justi- 
fication for this, one could use the Lebesgue Dominated Convergence theorem 
({2], p. 66], which involves finding an integrable function which bounds the ab- 
solute value of the integrand. In cases where the integrand, say p,(x), is a proper 
density for all n, Scheffé [7] has shown that a sufficient condition for demon- 
strating the existence of such a bounding and integrable function is that the 
limit (as n tends to infinity) of the integrand is also a proper density, say p(x). 

It is now possible to proceed to 

THEOREM 2. lim,-. 87, = Br. 























LOGARITHMICO-NORMAL DISTRIBUTION 683 


Proor, The operating characteristic for the 7, test given in (5.7) may be 
written as 
F “ 1 2/292 AX 
(7.8) sr,=/ --- {II- 7 Hinz i—nyl*/203 A 


J iml Oy V/ 2x = 
where 7’, is the complement set of T, and is given by 
=-— OMe ) 


emmeseme i? 
1/V/n 


“a 


( 
(7.9) ft, = \@ ie Zn) 





Using the notation of (7.3) and letting 
(7.10) W; = %i— pw, 


one can write (7.8) as 








I —{1n( 2/292 aw 
(7.11 8 _= [ eee | — ¢ [In(w itn)—ny] i203 i 
\é VT, J#,° Il oy —/2n ° = a 
where 
(7.12 tT. { (wy , ww a » Wn) | BD s Za/Vn}. 


Scheffé’s theorem is now used in order to justify bringing the limit sign under 
the integral sign. Note first that the integrand of (7.11) is a proper density, so 
that it remains to show that the limit of this density is also a proper density. 
Also, since the limit of the product is the product of the limits, it is only neces- 
sary to consider the behavior of one such factor, namely, 

(7.13) f, (ws; 6) 


| (u +6)? _\? “2 
exp | — jin +0 ap /2 ee —e + 


which will henceforth be written as 


(7.14 f.(w;; 8) = 1 gr Aattas 


V2 Ai 


Now for a fixed w;, as u~—> © 


, 


7 45) s) : l ts oe + yu a ea oe 
(7.15 A; = (wi + yw) /m(i+—t) ae V1 + Ou +8) A. 





Furthermore, 





Ay oI = +2) j in| a | 
= in SF) + tl] + Cp 


w; — 6 =—3 
inf. + m= 814 O43) 


Il 





w; — 6 ~2 
<p cated oC ' 
se a) 


NORMAN C. SEVERO AND EDWIN G. OLDS 


+ O(u), 


so that for a fixed w; 
i: i 
A3 


(7.16 — (w; — 5)’. 
Hence, if one makes use of (7.15) and (7.16), it follows that 


(7.17) lim f,(w; ; 8) 


— 1 > 
pro V/2n . 
which is a proper density. 
Therefore, 


: 1 
lim Br, = | JT 5 
poo F DSzqlVn ui V2u 
f , Vn /n —(§—§)2/2(1/n) 
Lo) 


—— ¢ dé = 
V 29 
and so 
lim Br, = Br. 


Boe 


The proof of Theorem 2, above, suggests an interesting property of the loga- 
rithmico-normal distribution which is summarized in the following corollary. 

Coro.Luary 1. The standardized logarithmico-normal variate w Z— pz 18 
distributed asymptotically N(O, 1) as uz ©. 

Proor. The result follows immediately from that part of the proof of Theorem 
2 where it was shown that 


lim f,(w; 5) = - c é 


poo vV Qu 
(The mean of w may be taken as zero so that 6 = 0, which means uw = u,.) The 
theorem of Scheffé states that this is sufficient to show that 


; 1 —w2/2 
fu(w; 0) dw = ff - —¢e dw 
tim ff - baie - 


for all Borel sets S in R. 

Another property of the logarithmico-normal distribution follows readily 
from this corollary. 

Coro.uuarRyY 2. The standardized logarithmico-normal variate w r — pe 18 
distributed asymptotically N(O, 1) as e ~—» 6. 





LOGARITHMICO-NORMAL DISTRIBUTION 


Proor. Corollary 1 showed that x — yz, is asymptotically normal as yu, 


According to (3.5), 
= in| 1 + |, 
Mz 


which means yp, — © if and only if o, — 0. Hence, z — uz is asymptotically 
N(0, 1) as o, > 0. 

The result of Corollary 2 was also obtained by Yuan [8] who considered the 

normal variate y = (1/c)ln{(z — a)/b] and showed that 

lim y = ae, 

c0 G 
This, according to Yuan, would imply z is asymptotically normal as c approaches 
zero. The quantity c corresponds to o, . 

The T;-test. One would expect that a similar result to Theorems 1 and 2 would 
be true for the 7;-test. Since the logarithmico-normal distribution approaches 
the normal distribution as nu, — ©, one would conjecture that for large values 
of » the most powerful tests based on the two distributions could be interchanged 
with a guarantee of similar calculated risks. 

The corresponding theorem to those given above reads: 

THEOREM 3. lim,+. 87, = 8r. 

The details of this proof, which are not included here, may be found in Severo 
[9]. The theorem is proved as a special case of more general results which are 


summarized in two theorems. The first is concerned with the uniqueness of the 
most powerful critical region for testing a simple hypothesis as a parameter of 
the distribution is allowed to pass to its limit. This uniqueness is demonstrated 
up to a set of measure less than an e« > 0. The second theorem then justifies the 


convergence of the power function to the power function of the limiting critical 
region. 


8. Acknowledgments. The authors are indebted to Mr. Robert E. Odeh for 
his generous assistance in carrying out much of the computation. Also, the 
authors wish to thank Dr. H. L. Harter for his helpful suggestions for the im- 
provement of this paper. 


REFERENCES 


{1] D. J. Finney, “On the distribution of a variate whose logarithm is normally distrib- 
uted,” J. Roy. Stat. Soc. Suppl., Vol. 7 (1941), 155-61. 

H. Cramé&r, Mathematical Methods of Statistics, Princeton University Press, Princeton, 
1951. 

P. B. Patnaik, “The non-central x? and F-distributions and their applications,”’ 
Biometrika, Vol. 36 (1949), pp. 202-232. 

S. H. Aspe.-Aty, ‘‘Approximate formulae for the percentage points and the prob- 
ability integral of the non-central x? distribution,’’ Biometrika, Vol. 41 (1954), 
pp. 538-540. 


[2] 





686 NORMAN C. SEVERO AND EDWIN G. OLDS 


[5] E. S. Pearson anv H. O. Harrier, Biometrika Tables for Statisticians, Vol. 1, Cam 
bridge University Press, Cambridge, Mass., 1954. 

[6] M. E. Munrosr, Introduction to Measure and Integration, Addison-Wesley Publishing 
Company, Inc., Cambridge, Mass., 1953. 

[7] H. Scumrrs, ‘‘A useful convergence theorem for probability distributions,’ Ann. Math 
Stat., Vol. 18 (1947), pp. 434-438. 

[8] Pae-Tst Yuan, “On the logarithmic frequency distribution and the semi-logarithmic 

correlation surface,’’ Ann. Math. Stat., Vol. 4 (1933), pp. 30-74. 

Severo, “A comparison of tests on the mean of a logarithmico-normal distribu- 

tion with known variance,’’ (Unpublished Thesis), Library of the Carnegie In 

stitute of Technology, 1955. 


[9] N.C 





TWO-SAMPLE PROCEDURES IN SIMULTANEOUS ESTIMATION: 
By W. C. HEaty, Jr.’ 
University of Illinois 


1. Summary. In this paper, two-sample procedures of the type originated by 
Stein [4] are developed for a number of problems in simultaneous estimation. 
The results include the construction of simultaneous confidence intervals of 
prescribed length or lengths and confidence coefficient 1 — a@ for (1) all normal- 
ized linear functions of means, (2) all differences between means, and (3) the 
means of k independent normal populations with common unknown variance. 
Simultaneous confidence intervals of length / and confidence coefficients known 
to be not less than 1 — a@ are constructed for all normalized linear functions of 
the means of a general multivariate normal population. The single sample ana- 
logues of these problems have been discussed by Tukey [5], Scheffé [6] and Bose 
and Roy [7]. Also, a confidence region having prescribed diameter (or volume) 
and confidence coefficient 1 — a is constructed for the mean vector in the general 
multivariate normal case. 

The procedures depend only on known and tabulated distributions. Illustrative 
applications from the analysis of variance are described. 


2. Introduction. In 1940, Dantzig [2] showed that for the Student problem 


Hypothesis: y» = po 
(2.1) 
Alternative: yu ~ po, 


where yu is the unknown mean of a normal distribution with unknown variance 
o’, there exists no test having power independent of oc” based on a sample of fixed 
size. More generally, it is shown in [3], Sec. 5.2, that if @ is a location parameter 
and an unknown scale parameter is present, there exist neither confidence inter- 
vals of prescribed length and confidence coefficient nor point estimates with 
bounded expected squared error for 6. The important general problem has thus 
been posed: how to conduct experiments in order to obtain a predetermined 
degree of accuracy in the presence of unknown scale parameters. 

In 1945, Stein [4] provided an ingenious solution by sampling in two stages 
for the case of the Student hypothesis (2.1), and, in fact, for a genera! linear 
hypothesis. In his procedure the size of the sample at the second stage depends 
on the results of the first. He also provided, in the same vein, a two-sample 
technique obtaining a confidence interval for 1 having predetermined length and 


teceived December 13, 1954. 
1 This work was supported in part by the Office of Ordnance Research, U.S Army, 
under Contract No. DA-11-022-ORD-881. 
* Now with the Ethy! Corporation Research Laboratories, Ferndale, Michigan 
687 





688 W. C. HEALY, JR. 


confidence coefficient. By this is meant a rule for constructing an interval (which 
is a function of the observations) with the two properties that 


(a) the length of the interval is equal to /; 
(b) the probability that the interval contains the true value of yu is exactly 
l — a, 


where / and 1 — a have been specified in advance. Recently, Seelbinder [10] has 
published tables of the expected total sample size for Stein’s procedure. 

Problems of simultaneous estimation and simultaneous tests of hypotheses 
constitute a dilemma to practicing statisticians. A common example was long 
provided in the application of the analysis of variance when the F-test had re- 
jected the hypothesis of homogeneity of means. The natural desire of the experi- 
menter to make further inferences about the means, such as deciding between 
which groups of means differences existed, was thwarted by existing statistical 
theory before 1950. Analysis of variance theory made no provision for such 
successive inferences, and the experimenter who proceeded anyway accepted the 
hazard of an unknown significance level for his final conclusions. 

Work by Tukey [5], Scheffé [6], Bose and Roy [7], Dunnett [13], and others 
since 1950 has produced valid techniques for making such simultaneous or suc- 
cessive comparisons, and, importantly, including comparisons suggested by the 
data themselves. In a number of problems involving normal distributions, and 
including the F-test dilemma above, the techniques are easy to apply. This 
contributes to their practical importance. 

In this paper, the Stein two-sample idea for obtaining predetermined accuracy 
is applied to some of the simultaneous confidence interval problems considered 
in [5], [6], and [7]. Three basic problems which are appropriate to a variety of 
applications are treated first, with examples. They involve k independent normal 
populations with unknown means and unknown common variance. 

Suppose that a (joint) confidence coefficient 1 — a@ is prescribed. Then, 
Problem I is to construct a system of confidence intervals of prescribed lengths 
l,, l, --- , l, for the k means. Problem II is to construct a system of confidence 
intervals each of prescribed length / for the k(k — 1)/2 differences between the 
means. Problem III is to construct a system of confidence intervals, one for each 
possible normalized linear function of the k means; each interval is to have 
length /, where / is specified in advance. These systems of intervals are each to 
have joint confidence coefficient 1 — a. For a number of applied situations the 
comparisons of interest are reducible to those of Problems I, II, or III. 

In addition, Problems IV and V involving the means of a k-variate normal 
population with unknown covariance matrix are treated. Problem IV is to con- 
struct a system of confidence intervals, one for each possible linear function of the 
k means. The length and confidence coefficient specifications are as in Problem 
III. Problem V is to construct a confidence region for the k means, having a 
prescribed maximum diameter and confidence coefficient. Problems IV and V 
are thought to be the first multivariate-normal two-sample procedures. 





TWO-SAMPLE PROCEDURES 6389 


Mention might here be made of the application of the Stein idea to the problem 
of ranking means of normal populations, made by Bechhofer, Dunnett and 
Sobel [1]. 

In Section 3 are stated the distribution results employed for solution of these 
problems. In Section 4 the univariate problems and solutions are given; in Section 
5, three analyses of variance situations are shown, for illustrative purposes, to 
correspond to Problems I, II, and III, and hence are solved. In Section 6 the 
multivariate problems and solutions are described, and the solutions are justified. 


3. Distribution results. 

THEOREM 3.1. (Stein). Let Xa, Xw2,+-- (¢ = 1, 2,--- , k) be mutually inde- 
pendent random variables, X », being distributed N (0;, 0°). Let mn, , n2, +++ , Me be 
fixed nonnegative integers. Let s be an unbiased estimate of o based on m degrees 
of freedom, distributed independently of > LF Xa and Xinar, Xingse 
(i 1,2,---,k). Letan,a2,--+- , din; , a;, and N; be functions of s° such that 
(1) aya a; for h S n; and (2) N; = max (n;, 1). Define 


Ni 
. Ain(X in _ 6;) 
h==l 
fie. . <a 
( ais) 
h=1 


Then, Y1, Y2,--:, Ye, 8 are mutually independent random variables, Y ; being 
distributed N(0, o’). 

Coro.uary 3.2. Define Wy = max;ex(|Y;\/s). Then, W; has the distribution of 
the Studentized Maximum Modulus with (k, m) degrees of freedom. 

This distribution is tabulated in [11]. 

Coro.iary 3.3. Define W2 = max;,jcx |Yi — Y3\/s. Then, We has the distribu- 
tion of the Studentized Range with (k, m) degrees of freedom. 

This distribution is tabulated in [8]. 

Coro.iary 3.4. Define W; = ‘wa VG / s*. Then, W;/k has the F-distribution 
with (k, m) degrees of freedom. 

These three corollaries are the distribution results required for the univariate 
examples that are to follow. A multivariate analogue of Theorem 3.1 is stated 
next. 

TuHeoreM 3.5. Let X, = (Xu, Xan,--- , Xen)’, (h = 1, 2,--- ) be mutually 
independent random vectors,’ the X, having a multivariate normal distribution with 
mean. vector ® and covariance matrix 6 = (0;;). Let n be a fixed nonnegative integer. 
Let S = (S;;) be a matrix of unbiased estimates S;; of the o;; , the S;; having jointly 
a Wishart distribution with m degrees of freedom. Suppose the S;; are independent 
of both > jnt X; and Xna1, Xn42,°°* . Leta, , @,--+,a,, a, and N be functions 
of S such that (1) a; = a forj S n, and (2) N 2 max (n, 1). Define 


Y = 7 me’ -- 0 / (Xai )- 


j=l 


* A prime will be used to denote the transpose of a vector or matrix. 





690 . C. HEALY, JR. 


Then, 


m — k + l Y's ly 
k 
has the F-distribution with (k, m — k + 1) degrees of freedom. 
The proof is similar to the univariate case and will be omitted. 


4. Univariate problems and solutions. 

4.1. ProspLtem I—Estimation or Means. The Problems I, II, and III of this 
section dea] with the following situation: X,, X2,--: (@@ = 1, 2,---, &) are 
mutually independent random variables. The distribution of X, is N(@;, 0°), 6: 
and o° being unknown. 

Nonnegative integers nm, mz, --- , % have somehow been determined, and 
Xa, Xe2,--- , Xinj(i = 1, 2, --- , &) have been observed if n; > 0. If n; = 0, 
no observations have been taken on the 7th distribution. 

Let s° be an unbiased estimate of o° based on m degrees of freedom, which is 
independent of both > bt Xin, and Xeni41, Xinjae, °°. A value of s has 
been observed; it may or may not have been computed from the observations 
Xn, h Sn;. 

With this set-up we can now state problems and solutions. First a remark 
is in order about the statements of the problems. A confidence coefficient 1 — a 
will be prescribed in advance and it can be exactly attained. However, we will 
require only that the actual confidence coefficient attained shall be 2 1 — a. 
The reason is that the solutions then obtained are uniform improvements on the 
solutions obtained by requiring exactly 1 — a confidence coefficient. This same 
situation is encountered and discussed in Stein’s original paper [4]. 

STATEMENT OF ProsiemM I. Given 0 < 1 — a < landd,,h,---,k, with 
l; > 0, determine joint confidence intervals J;(X) for 6; , 02, --- , 0 such that 

(1) length of JX) = 1;, 

(2) Pr {6;° 7X) for alli S k} = 1 — q@ for all |, 0, ---,&,o. It is of 
interest to note that in the single sample analogue of this problem, given, for 
example, in [7], p. 519, the k confidence intervals are obliged to have the same 
(random) length, while here the lengths are allowed to differ. 

SoLuTion oF Prosiem I. Determine constants c; such that 


i. +4 
Pr< Wi : wr =l-—a 


when W, has the distribution of the Studentized Maximum Modulus with (k, m) 
degrees of freedom. Determine integers NV; by 


(4.1.1 N; = max{n,;, [s' / ¢ 


/ ‘t 


} 
PoP. 1j,; 


where [{ |} means “greatest integer less than.” 
Observe Xj.n;41, Xinia2,°** » Xen; @ = 1, 2,---, k) if Ny > n,, and esti- 
mate 6; by the interval 





TWO-SAMPLE PROCEDURES 


(4.1.2) i? Xa + 


é kml 2 

4.2 Prostem II—EsTImMatTIon OF DIFFERENCES. 

STATEMENT OF ProsBLeM II. Here we will take nm) = m = --: = m nN. 
Given 0 < 1 — a < land! > ® determine joint confidence intervals 1,(X) 
for the k(k — 1)/2 differences 6; — 6;,7 <j S k, such that 

(1) length of J;;(X) = 1, 

(2) Pr{@; — 6;¢7;;(X) for alli < 7 S k} 2 1 — a@ forall & , , - 


SOLUTION OF ProBLEM II. Determine a constant c such that 
l 


when Ws has the distribution of the Studentized Range with (k, m) degrees of 
freedom. Determine N by 

(4.2.1) N = max (n, [s’ / c] + 1). 

Observe Xingi, Xinge,°°° 
6; — 6; by the interval 


, Xw (@ = 1, 2,---, k) if N > n, and estimate 


b 


(4.2.2) > (Xu — Xp) + 5 


N f= 


4.3 Proptem II1]—Esrtimation or ConrRasts. 

STATEMENT OF ProBLeM III. Again take nm. = np = --- = m = n. Given 
0 <1—a< 1 andl > 0, determine a system of simultaneous confidence in- 
tervals I, Ts for the elements of the set L of all linear functions >~“*_,C,,6; with 

*_3Ci, = 1. (The index » denotes a particular element of L.) 

The interv A, are to have the properties 

(1) length of 7,(X) = 

(2) Pr {>-41C.,0; ¢ I,(X) for all vy} = 1 — @ for all &, &,--> 

So.uTion OF ProsieM III. Determine a constant c such that 


Pr{W; < F/4e} =1l—a 


when W;/k has the F-distribution with (k, m) degrees of freedom. Determine N by 
(4.3.1) N = max(n, [s’ / c] + 1). 


Observe Xjnu1, Xins2,°*', Xv, if N > n, and estimate Dd iatC 6; by 


iv, 


the interval 


(4.3.2) W 2 Cw yx Kan ot 5. 
ie] 
Note that to estimate some linear function >-4.1d;6;, where > indi, = 1, we 
simply employ the interval 
- 


k 
> 2 dir 2 Xin 45: 


® h=} i=l 





692 W. C. HEALY, JR. 


The argument required to justify these solutions is essentially the same as that 
originally given by Stein, and details will be omitted. 


5. Three applications. In this section we apply the previous section to the 
solution of the two-sample versions of three simultaneous estimation examples 
treated by Bose and Roy and by Scheffé for the single sample case. 

5.1. 2” Factorial experiment. In this example we will utilize Problem I of the 
preceding section. Suppose that in an experiment involving r factors, each at 
two levels, it is desired to obtain joint confidence intervals of fixed length for the 
r main effects and r(r — 1)/2 two-factor interactions. Suppose also that the 
experimental situation is replicable as many times as desired. An example might 
be an experiment to discover the effect on the yield of a chemical reaction of the 
addition or nonaddition of different reagents. 

Let 611 , 602, «++ , 0 be the true values of the main effects, and let 6.2 , 6:13, --- , 
6,1, be the true two-factor interactions. Denote the factors by A; , A2,--- , Ar 
and let the symbolic product (a;, a;, --+ a;,) denote the true yield when factors 
A;,, Ai,, +++, A, are at their upper levels and all other factors are at their 
lower levels. For the chemical illustration, (a,;a,a;) means the true yield when 


reagents 1, 2, 3 have been added, and no others. 
Then 6,; is defined by the result of multiplying out the expression 


1+ 1)(a, + 1) --+ (an + I(ai—D(aai + 1) +--+ (a + 


and 6@;; is defined by the result of multiplying out the expression 


(a; + 1)(a2 + 1) -++ (ap + 1)(a; — 1)(a@ig1 + 1) 


++ (aja + D(a; — I(ayi t+ 1) --- Git Dias Dd, 


where the dots indicate terms with plus 1’s. For further details on factorial ex- 
periments, the reader is referred to [9], for example. 

We will conduct the experiment by taking some number of replications of the 
2’ factorial design. Let Y;;, represent the usual estimate of 6;; from the Ath repli- 
tion. That is, Y;;, is obtained by substituting the observed yields for the true 
yields in the expression defining 6;;. Then the usual assumptions are that Yi , 
Yiz,°-:,% S79 S71, are mutually independent random variables with Yi; 
distributed N[6;; , (o°/2”*)], where o” is the variance of a single observed yield. 
Suppose that an unbiased estimate of o” can be obtained from the sum of squares 
due to replications. 

Having decided upon a confidence coefficient | — a and a common length J, 
the problem before us is to produce r + (r — 1)r/2 confidence intervals of length 
l, one each for the 6;;, 7 S 7 S r. Except for notational changes, this is exactly 
Problem I with a common length / for all the intervals. It remains to adapt the 
solution of Problem I to the present case. 

The first step is to obtain a preliminary estimate of o°/2” *. There are a variety 
of ways to accomplish this. A simple one to describe is: 









































TWO-SAMPLE PROCEDURES 693 


Choose an integer n and perform n replications of the 2° factorial design; 
compute the replications sum of squares, say 7', which will be based on (n — 1)2’ 
degrees of freedom; and estimate o°/2””* by T/(n — 1)2'2"" = T/(n — 1)2”~” 

Determine a constant c such that 


Pr< Wi Ss ns J, >=l-a 
“ ¢ 


when W, has the distribution of the Studentized Maximum Modulus with 
r+r(r — 1)/2, 2’(n — 1) degrees of freedom; determine N by 


:, , \ 
N = max4n, aie I J) - 
: (n — 1)2*-*c 


/ 


perform \ — n further replications of the 2” factorial design; and estimate 
6;; by the interval 


N } 
Yin ts 


1 . . , 
N hea = J ; 


IIA 
lA 


Of course, it is not necessary to replicate the entire design in order to estimate 
o /2°~*. In the event one does replicate only a portion of the design for this pur- 
pose, a question not encountered before can arise; namely, what to do if the total 
number of replications required, based on the estimate of o’, is smaller than the 
number of replications already obtained of a portion of the design. This question 
seems too special to discuss further than to point it out, at this time. 

In practice it would be unlikely that all main effects and 2-factor interactions 
would be of equal importance. It would be tempting to specify different lengths 
as in Problem I. However, the factorial design requires that each combination 
be replicated equally often in order to get orthogonal estimates; and the estimates 
all have a common variance. In this case, one might end up choosing N based on 
the smallest length and would then get the same results as though all lengths 
had been specified equal to the smallest. 

5.2. Randomized blocks experiment. In a randomized block setup for comparing 
k treatments, suppose it is desired to obtain joint confidence intervals of fixed 
length and confidence coefficient for the k(k — 1)/2 differences between the 
true treatment means. This example utilizes the solution of Problem II. 

We will assume the following conventional model for any given block, say 
block number h: 


Yu =upt+O0+b+ ea, 


where Y » is the observation on treatment 7 in block h, 
pu 1S & constant, 
6; is the contribution from the 7th treatment, 
b, is the contribution from the hth block, 
€, are mutually independent, each distributed N(0, o”) fori = 1,2, --- ,k; 
h = 1,2, 


Assume that the experiment can be replicated in as many blocks as desired. 


694 W. C. HEALY, JR. 


The problem is this: given a confidence coefficient 1 — a and a length /, 
produce k(k — 1)/2 simultaneous confidence intervals each of length J, one each 
for the differences 0; — 0; , (i <j S k). To recognize this as Problem IT, let 


(5.2.1) Xa = Yu ~~ by, . 


Then Xa, Xe2,°::, (@@ = 1, 2,--+, &) are mutually independent random 
variables, X , being distributed N(6;, o°). In terms of the X,, the problem is 
exactly Problem II. Although we cannot observe the X , we can nevertheless 
write down the solution from (4.2) in terms of the Xy, ; we will then discover that 
the solution to the original problem depends only on the original ob- 
servations Y%. 

Following (4.2), the experiment can be conducted as follows: 

Choose an integer m and perform the randomized block experiment with n 
blocks and estimate o*. The conventional estimate, which we adopt and call 7’, 
is based on (k — 1) (n — 1) degrees of freedom. This estimate is computed from 
the Y,» that are observed. 

Determine a constant c such that 


( l ) 
Pri W: S =) = 1- 
r 229 Vel Qa 
when W, has the distribution of the Studentized Range with k, (k — i)(n — 1) 
degrees of freedom; determine N by 


N = max{n, [T/c] + 1}; 


perform a second randomized block experiment with NV — n blocks, if N > n; 
and estimate 6; — 6; by the interval 


i< . l 
N > (Xin aa Xjn) + 5? <j< k. 
LV hel 9 


This is the solution in terms of the Xy. To get the solution in terms of the 
Ya, note from (5.2.1) that each interval is 
N l 
a (Yn —u—bh — {Yn —u — b}) += 


hal 7 


, 


: a (Ya — Ya) + = 
N hal 2 

5.3. Two-way analysis of variance with replications. Consider a situation ap- 
propriately represented by a two-way classification, say by rows and by columns, 
and suppose there are k rows and p columns. Suppose further that the situation 
is replicable as many times as desired. 

Let Y;;, denote the observation in the ith row, jth column of replication 
number h. We adopt the following conventional model for the Ath replication: 








TWO-SAMPLE PROCEDURES 695 


(5.3.1 Yia=evtret+t;+b0; ten, t= 1,2---,k,7 = 1,2---, D, 
where yu is a constant, 
r; is the contribution from the 7th row, 
t; is the contribution from the jth column, 
b;; is the contribution from the (7, 7) cell, 
€:j, are mutually independent random variables, each distributed N (0, o°). 


Suppose we are interested in comparisons between true row means, i.e., be- 
tween the quantities 


1 P 
6; = w+ ri +o 2 by 
P j=l 
=pt+r+ bz, += 1,2,--- ,k, 
where 6, 1/p>?-1b;;. (We will use this dot notation in the usual way to 


indicate summed-out subscripts.) 

Suppose further that the situation is such that we cannot tell in advance what 
row comparisons will be of interest or what rows may turn out to be important. 
In this situation, since we do not know precisely what we want, it may be de- 
sirable to ask for a fixed degree of accuracy for any and all confidence statements 
that might be made about contrasts between the row means. A contrast is a 
linear function }>i1C,0; such that }>'_.C,; = 0. If we should fix the confidence 
coefficient at 1 — a@ for the infinite set of all possible contrasts, then for any 
necessarily finite number of contrasts that we decide to estimate, the joint con- 
fidence coefficient must exceed 1 — a. 

It is apparent, however, that requiring the confidence intervals for the various 
contrasts to have a common fixed length would be asking too much; two con- 
trasts differing only by a constant multiplier and each estimated by an interval 
of length / are logically incompatible. We will ask instead that the intervals 
for all contrasts >~*.C,6;, such that }~i,Ci = 1, should have fixed length 1; 
i.e., we consider only “normalized” contrasts. This is equivalent to asking that 
the interval for every contrast }>~i_.d;0; should have length > inxs)". 

The problem is this: given a joint confidence coefficient 1 — a and a length J, 
produce a system of joint confidence intervals, each of length J, one each for 
every normalized contrast Din1C 9; . We have now to reduce this problem to 
Problem III. To this end, let 


(5.3.2) Yin = p> Yin =votrnt+b. tit Ga, 
j=1 
and let 
(5.3.3) Xn=Vina-l=ptrnt+b tian =O +e a. 
Then, Xa, Xw,--: (@ = 1, 2,---, k) are mutually independent random 


variables, XY, being distributed N(0;, o°/p). 





696 W. C. HEALY, JR. 


The problem is now in the form of Problem III, except for the present restric- 
tion that Dini = 0, since we here are considering only contrasts. This excep- 
tion can be resolved by the following reduction. 

Make an orthogonal transformation from the X », to Z,;, , defined by 

k 
(5.3.4) Xa = > usgZ, 


j=l 


such that 


] k 
(5.3.5) Zu = Fe D, Xu. 
Vk i=l] 
Then, since the inverse of an orthogonal matrix is its transpose, ux /k 
(¢ = 1, 2,---, k). In terms of the Z, , substitution from (5.3.4) and (5.3. 
gives 


k k ok k k-l 
7 Ci:Xan = Zz ~ Cy uss Zin Ru > Ci uij Zin, 


i=l i=l j=l i=] j=l 
since ux, = 1/+/k and Dd emaC's = 0. 
Therefore, 


k—1 


(5.3.6) = 2) d;Zn, 
om j=l 
where d; = Dial ss : 
The Zj, ,j & k, are independently normally distributed with common variance 
o /p. Setting 7; = E{Zjn}, we have 


k—1 


(5.3.7) 7 C6; = ZZ d; Ni» 


i=1 j=1 


by taking the expectation of (5.3.6). 
Also if 


5 k—1 
(5.3.8) ?=1, then) d? =1, 
i y=1 


by computing the variance of both sides of (5.3.6). 

Therefore, in view of (5.3.6), (5.3.7), and (5.3.8) we have reduced the set of 
all normalized contrasts >~4,C,0; to the set of all normalized linear functions 

"=i dyn; , without a restriction that Dial d; = 0. This reduction is given in [6]. 

The problem has become: to construct joint confidence intervals of length / 
and confidence coefficient 1—a for all linear functions >-'-=i djn;, with 
Diai * = 1, based on the random variables Z;,,Zj2.,--: (j = 1,2,---,k — 1) 
which are mutually independent, Z, being distributed N(n; , o’/p). This is now, 
in terms of the Z j, , exactly the Problem III, and we proceed to adapt its solution. 
Again, it will turn out that the solution will depend only on the Y;;, , which can 
be observed. 





selene 


TWO-SAMPLE PROCEDURES 697 


Following 4.3 the experiment can be conducted as follows: Choose an integer n 
and perform n replications of the two-way layout in order to estimate o°/p. 
Suppose that the replications sum of squares, say 7’, provides an unbiased esti- 
mate of o°; T will be based on kp(n — 1) degrees of freedom. Estimate o*/p, 
therefore, by T/kp’(n — 1). 


Determine a constant c such that 


/ 


mes a= =1-— a4, 


when W;/k — 1 hasthe F-distribution with k — 1, kp(n — 1) degrees of freedom; 


determine N by 
N = max4n, | a 1 |+ 1\. 
y kp*(n — lL)e 


perform N — n further replications of the two-way layout, if N > n; and estimate 


k-1 I 
> din = > C.8; 
j=l inl 


by the interval 


N /k-1 
l 


> (Laden) in 


NEN\E 9 


(5.3.9) 


To express this interval in terms of the original observations, substitute in 
(5.3.9) from (5.3.6) and (5.3.3), obtaining 


N k 


os ze 2 C; Xin + l/2 


or 


N k 


! Z i AY ir 7 t.) + l/2 


N hel ial 


or 
N : 


k 
p 2 z Ci Vin = 1/2 


hel i=l 


=a|— 


or 


k 
D Ci¥:.. + U/2. 
i=l 
5.4 Comments on the examples. The preceding three examples were chosen for 
illustrative purposes. Depending on actual circumstances, any analysis of vari- 
ance design could produce any of the three types of problems we have considered, 
that is, estimation of independent effects, differences, or general contrasts. 
In each example we have specified a way, arbitrarily, to estimate the variance, 


ai 





698 . C. HEALY, JR. 


utilizing the experiment design involved. It is perhaps worth remarking thut it 
is only necessary to have an independent unbiased estimate of the variance. 
Where is comes from is immaterial and in practice it may come from some other 
experiment (though in theory it should not have been used for any other purpose). 
This fact is inherent in the setup of Problems I, II, ITI, since we allow the initial 
sample sizes n; to be 0. 

Another feature of the examples is the lack of discussion of how to choose the 
degrees of freedom on which to base the estimate of variance. Such a discussion 
would presumably be based on tables of the expected total sample size, but such 
tables are lacking for these problems. 

Also, somewhat more general situations than the preceding examples would 
indicate are reducible to one of the Problems I, I, or III. In particular, the situa- 
tions in Scheffé [6], p. 87, and Bose ard Roy [7], p. 515-519, when posed as prob- 
lems of simultaneous confidence intervals of fixed length and confidence coefficient 
are so reducible. The methods of reduction are essentially the same as indicated 
in these papers. 


6. Multivariate problems and solutions. Throughout the discussion of multiva- 
ate problems, we will denote all matrices by boldface letters; primes will 
denote matrix transposes. 

6.1. Proptem [V—Esrimation or Contrasts. The Problems [V and V of 
this section deal with the following multivariate situation: 

x, = (Xu, 9 Xo, » C8 Hs Xn), ho ae 
are mutually independent random vectors, each X, having the multivariate 
normal distribution with mean vector 6 and covariance matrix 6. 

A nonnegative integer n has somehow been determined, and if n > 0, values of 
X, , X:, --- , X, have been observed. If n = 0, no observations on X have been 
taken. 

S = (S,,) is a matrix of random variables S;; , the S;; having a Wishart dis- 
tribution with m degrees of freedom; S;; is an unbiased estimate of o;; , and the 
S,; are independent of >°7.X,, and X,4:, Xn42,°:: . A value of S,; has been 
observed, 7 S 7 S k. These may or may not have been computed from the X, , 
Asn. 

STATEMENT OF ProsieM IV. Given 0 < 1 — a < 1 andl > O, determine a 
system of simultaneous confidence intervals J,(X) for the elements of the set D of 
all linear functions > 10,0; with Dini = 1. The intervals are to have the 
properties that 


(1) the length of 7,(X) = 1 for all », 


i==1 


k 
(6.1.1) (2) pf C6; ¢ I,(X) for all »\ > 1 — a for all @ and 6 





abet ee 


TWO-SAMPLE PROCEDURES 699 


SOLUTION OF PRoBLEM IV. Determine a constant c such that 


2\ 
| 


Pr Ws <= Ze! = 1 — a 


when (m — k + 1) W,/k has the F-distribution with (k, m — k + 1) degrees of 
freedom; determine N by 


(6.1.2) N = max< n> | +1>, 


where \ is the largest latent root of S; observe Xa41, Xango, °° Kw 
mate >-'1C.0; by 


(6.1.3) CX + 1/2, 


, Y ’ Y vv /aT N 
where C. =. (C lp » ( @.°**? ( kv) and x = 1, N > mK i 
JUSTIFICATION OF SOLUTION. We have to establish (6.1.1). Now, 


: and esti- 


Pr{|C)(X — 6)| < 1/2 for all v} 


( rj2 


Pr<N| ci(X — )/ < ~ for all y> 


. 
P Pr<N |Ci(x _ 6)!" < x for all »>, 


since N = X/c. Using the fact that 


sup C,SC, =, 
C:C,=1 


where \ is the largest latent root of S, it follows that 


2 


Pr<N|C(X — 6)|* < =, for all »> 
te 





(6.1.4) > Pr eae = & for all »> 
= Pe gap (Hoe) Sa} 
(6.1.5) - Prin - 0)'S'(X — 0) < a ; 
since ' 
sup (C, u)’ = y’Sy. 





C:C,=1 c’SC, 


The Theorem 3.5, with a, = 1/N, and the definition of ¢ imply that (6.1.5) is 
equal to 1 — a; this establishes (6.1.1). 





700 W. C. HEALY, JR. 


Problem V is a multivariate analogue of the original Stein procedure 

6.2. PRoBLEM V—EsTIMATION OF MEAN VECTOR. 

STATEMENT OF PrRoBLEM V. Given 0 < 1 — a < 1 and/ > 0, to construct a 
confidence region R(X) for ® such that 


2.1) (1) the maximum diameter of R(X) does not exceed /, 
3.2.2) (2) Pr{@e R(X)} = 1 — a. 


Here it is possible to obtain a solution for which the maximum diameter of 
R(X) is exactly 1, but the solution we present is uniformly better. 
SoLUuTION OF Prosiem V. Determine a constant ¢ such that 


when (m — k + 1) Ws/k has the F-distribution with (k, m — k + 1) degrees of 
freedom. Determine N by 


N max <¢ m3] +- ] 
\ c_| 


where X is the largest latent root of S. 
Observe X41, Xni2, °°: , Xw, and estimate 6 by the set R(X) of points t 
satisfying 


(6.2.3) ra -vos'ad ~¢ 


where X = 1/N oi, . 

A similar problem in which R(X) is required to have predetermined volume is 
solvable in a similar way but is possibly less useful. 

JUSTIFICATION OF SoLuTION. We have to establish (6.2.1) and (6.2.2) when 


~—ee 


R(X) is the set of points t satisfying (6.2.3). Now (6.2.2) means that 
°) 


4c 


Pr<N(X — 0)’ S“(X — 6) < 


when 6 is the true mean vector, and hence follows from the definition of ¢ and 
Theorem 3.5 with a, = 1/N. 

To establish (6.2.1), we first note that (with probability one) (6.2.3) defines 
the interior and boundary of an ellipsoid in k-dimensional space. Now, if u’Au = 1 
is the equation of an ellipsoid, its maximum diameter is 2 ~/\, where } is the 
largest latent root of A’. Replacing A by 4cN— S'/l’, it follows that the maxi- 
mum diameter of the ellipsoid associated with (6.2.3) isl ~/X/~/cN < 1, since 
N = X/c. This establishes (6.2.1). 


7. Concluding Remarks. As in the case of Stein’s original work, it is possible 
to modify the solutions of Problems I, II, and III slightly so as to obtain a con- 
fidence procedure with exactly 1 — a@ confidence coefficient, and in Problem V 





T'WO-SAMPLE PROCEDURES 701 


so that the maximum diameter is exactly /. Such modifications, however, result 
in larger expected sample sizes. It should be emphasized that such modifications 
as these do not make possible the attainment of an exact confidence coefficient 
in Problem IV, because of the inequality (6.1.4); an intuitive picture of why 
this is so might perhaps be given by the following remarks. The total sample 
sizes are the same for Problems IV and V if / and 1 — a are the same. Thus, after 
the completion of sampling, we have the “information” that 6 lies in an ellipsoid 
like (6.2.3). However, the simultaneous intervals constructed in Problem IV use 
only the “information” that @ is in the sphere having the same center as (6.2.3) 
and with diameter equal to the maximum diameter of (6.2.3). Thus, to the 
extent that the ellipsoid (6.2.3) is smaller than the sphere, the actual confidence 
coefficient will exceed 1 — a. 

It is fairly obvious that the use of the confidence region R(X) given for Problem 
V, modified to have maximum diameter exactly /, to test a hypothesis concerning 
6 does not yield a test with power independent of the unknown covariance matrix. 
This is so since the shape of the region R(X) is not independent of the unknown 
covariance matrix. It is possible, but in a wasteful and artificial way, to construct 
a test of the multivariate hypothesis 6 = 0 having power independent of the 
unknown covariance matrix. This can be done simply by estimating 


a1, 02, °** , Om from k separate subsamples of the original sample and es- 
sentially employing the Stein procedure for the Student hypothesis to the indi- 
vidual hypotheses 4; , 0,6=0,---,6& = 0. 


There are many other problems which come to mind in connection with the 
Stein procedure and to which no specific allusion seems to have been made in 
the literature. One is to obtain a confidence interval of fixed length and confidence 
coefficient for a given linear fundtion > 1C,8; of the means of a general k- 
variate normal distribution with unknown covariance matrix. This problem is 
directly solved by Stein’s paper [4], since 


has the Student ¢-distribution with m degrees of freedom, when the S;; are un- 
biased estimates of o;; , having a Wishart distribution with m degrees of freedom, 
and (X,, X,,---, X;,) is the (independent) sample mean vector based on N 
observations. 

If 6, 0, +--+, & are the means of / independent normal populations with 
unknown (unequal) variances, it is a direct extension of results of Chapman [12] 
to obtain a confidence interval of fixed length and confidence coefficient for a 
given linear function > _;C,6;. The procedure depends on the distribution of 
the sum of k independent Student-t variables, for which tables do not seem to 
exist for k > 2. However, the normal approximation should be useful except for 
very small values of k and the degrees of freedom m. 


eR 





702 W. C. HEALY, JR. 


Finally, no study has been made of expected sample sizes for the procedures 
in this paper. Tables along the lines of Seelbinder [10] but with degrees of freedom 
k(n — 1) rather than n — 1 would be helpful for the univariate problems. Ex- 
pected sample sizes for the multivariate procedures would be more complicated. 


8. Acknowledgment. The author would like to thank Professor W. G. Madow 
for valuable assistance in the preparation of this paper. 


REFERENCES 
. E. Becnuorer, C. W. DUNNETT, AND M. SoBe, ‘“‘A two-sample multiple decision 
procedure for ranking means of normal] distributions with a common unknown 
variance,’’ Biometrika, Vol. 41 (1954), p. 170. 
:. B. Dantzia, “On the nonexistence of tests of ‘Students’ hypothesis having power 
functions independent of o”’, Ann. Math. Stat., Vol. 11 (1940), p. 186 
{. L. Leumann, “‘Theory of estimation,’”’ notes recorded by Colin Blyth, Associated 
Students’ Store, University of California, 1950 
. Srern, ‘‘A two-sample test for a linear hypothesis whose power is independent of 
the variance,’’ Ann. Math. Stat., Vol. 16 (1945), p. 243. 
J. W. Tukey, “‘The problem of multiple comparisons,’’ unpublished paper 
Henry Scuerrf, ‘‘A method of judging all contrasts in the analysis of variance,’ 
Biometrika, Vol. 40 (1953), p. 87 
R. C. Bose anv 8S. N. Roy, “Simultaneous confidence interval estimation,’’ Ann 
Math. Stat., Vol. 24 (1953), p. 513. 
J. M. May, ‘Extended and corrected tables of the upper percentage points of the 
‘Studentized’ range,’’ Biometrika, Vol. 39 (1952), p. 192. 
O. KemptTuorNne, The Design and Analysis of Experiments, John Wiley and Sons, 
1952. 
. M. SEELBINDER, ‘“‘On Stein’s two-stage sampling scheme,’’ Ann. Math. Stat., Vol. 24 
(1953), p. 640 
. S. Pruvat anp K. V. RamMacHANDRAN, “‘On the distribution of the ratio of the 
ith observation in an ordered sample from a normal population to an inde 
pendent estimate of the standard deivation,’? Ann. Math. Stat., Vol. 25 (1954 
p. 565. 
{12} D. G. Coapman, “Some two-sample tests,’’ Ann. Math. Stat., Vol. 21 (1950), p. 601 
{13} C. W. Dunnett, “‘A multiple comparison procedure for comparing several treatments 
with a control,’’ J. Amer. Statis. Assoc., Vol. 50 (1955), p. 1096 








DISTRIBUTION OF THE SUM IN RANDOM SAMPLES FROM 
A DISCRETE POPULATION 


By Cuta Kurt Tsao 
Wayne University 


1. Introduction and summary. In [1], the author proposed a ‘rank sum” 
criterion for testing whether or not a sample was drawn from a population 
having a completely specified continuous distribution. This criterion under the 
null hypothesis is distributed as the sum in a random sample from a discrete 
uniform population; it is otherwise distributed as the sum from a more general 
discrete population. In this paper we give a method of finding numerically the 
distribution of such random variables. Tables are given for the distribution of 
the sums from certain selected discrete uniform distributions. Normal approxi 
mations are investigated and applications are briefly discussed. 


2. Distribution of the sample sum. Let X, , X.,--- , X,, be a random sample 
drawn from a population having the discrete density 
(2.1 f(x; p) = Dz; a .@ 7, %, -*~, 
where 
(2.2) p (Pi, Bas °** 5 Beds MtPetee  tm=l. 
Let S be the sum of the sample values X,, X.,---,X, 3; ie., 
(2.3 5 = > - 

t==] 


We shall be interested in finding the distribution of the statistic S. 

It is wel] known that when k = 2, the distribution of S is binomial. For k > 2, 
the distribution of S in general does not assume a simple form. However, as it is 
easily seen, the pdf of S, denoted by g(s; p, m), is given by the coefficient of ¢ 
in the power expansion of the generating function: 


k m 
(2.4) D(t; p,m) = (x pit ) ; 


i=] 
and the cdf G(s; p, m) = + m9(Y; P, m) is given by the sum of coefficients of 
t’ in (2.4) fory = m,m+1,---,8. 
Furthermore, for m = 2, 3, --- , we have 
(2.5) D(t; p, m) D(t; p, m — 1)D(t; p, 1). 
teceived October 18, 1954 
703 





704 CHIA KUEI TSAO 


It then follows that, for m 2,3,---, we have 


~, 


k 
g(s; p,m) x Pig(s — 1; p,m 1), 
i=sl 


G(s; p, m) = » iG(s — 7%; p,m 
where 


Do, je hes. % 
0, otherwise, 


| 0, 


G(s; p, 1) = > 2. ; 


i=l 


L, 


The above formulae provide a quick method of finding numerically the values of 
g(s; p, m) and G(s; p, m), once p = (pi, P2, °°: , Pe) iS known. In the special 
case where pi} = po = --: = px = 1/k, the formulae (2.6) and (2.7) become 


k 


8 Sols — 1:1] kim - 1), 


g(s; 1/k, m) ; 
© éend 


(2.6a) 


le 
G(s;1/k, m) 7 2 G(s — i; 1/k,m — 1), 
v i=l 


Here, we have used g(s; 1/k, m) and G(s; 1/k, m) as short notations for 
g(s; 1/k, --- , 1/k, m) and G(s; 1/k, --- , 1/k, m) respectively. 

The functions g(s; 1/k, m) and G(s; 1/k, m) are useful, and they are uniquely 
determined for any given k and m. The values of G(s; 1/k, m) for k = 3, 4, 5, 6 
and m = 1, 2, --- , 20 are tabulated in Tables 1, 2, 3, and 4 with s = m +r. 
Since g(s; 1/k, m) is symmetrical (see [1}), only half of the values of the cdf 
G(s; 1/k, m) are given. The remaining half can be obtained from the following 
identity: 


(2.8) G(km — r;1/k,m) = 1—-G(m+r- 





TABLE 1 
Values of G(m + r; 1/3, m) 


m 
Tr 
1 2 3 4 6 
0 . 33300003 -1111111 .0370370 .0123457 0041152 .0013717 .0345725 
l . 6666667 . 3000000 . 1481482 0617284 .0246914 .0096022 .0036580 
2 .6666667 . 3703704 . 1851852 .0864198 .0384088 .0164609 
3 6296296 .3827160 . 2098766 . 1069959 .0516690 
4 .6172840 .3950617 . 2304527 . 1252858 
5 .6049383 -4032922 . 2469136 
6 .5967078 | .4101509 
7 5898491 
§ 9 10 il 12 13 14 
0 .0715242 .0*50805 .0*16935 .0°56450 .0518817 .0°62723 .0°20908 
1 .0013717 .0350805 .0°18629 .0°67740 .0*24462 .0587812 .0531361 
2 0068587 .0027943 .0011177 .0°44031 .0317123 .0*65859 .0425089 
3 .0239293 .0107199 .0046741 .0019927 -0°83358 .0334309 .0713924 
4 0644719 .0317533 .0150892 .0069603 .0031311 -0013786 .0°59586 
5 1412894 0765635 0396789 0198140 .0095890 .0045179 .0020799 
6 . 2607834 . 1555149 .0879439 .0475707 .0247817 .0125006 .0061324 
7 .4156379 . 2725702 . 1682162 .0986130 .0553326 .0299011 .0156399 
S 5843621 .4202611 . 2827821 . 1796474 . 1086104 .0629082 .0351033 
9 5797389 .4241901 . 2917295 . 1899966 .1179799 .0702630 
10 .5758099 .4275940 . 2996570 .1994213 . 1267698 
ll .5724060 .4305765 . 3067434 . 2080482 
2 . 5694235 -4332190 .3131279 
13 .5667810 .4355811 
14 - 5644189 
m 
1 16 17 18 19 20 r* 
0 .0769692 0723231 .0877435 .0825812 0°86039 .0°28680 
] .0511151 .0°39492 .0°13938 -0749042 .0717208 .0860227 
2 .0594781 0535543 .0°13241 .0*49042 .0°18068 .0766250 
} 0455823 .0422139 .0°86960 .0533865 .0513087 .0°50218 
4 .0°25340 .0710623 .0°43976 .0*17999 .0572918 .0529271 
5 .0°93833 .0941585 .0718141 .0*78026 .0*33137 .0413913 
6 .0029360 .0013759 -0°63267 .0°28602 .0912735 .0*55926 
7 .0079507 -0039417 .0019112 0990841 0742415 .0719488 
8 .0189585 .0099484 .0050887 .0025442 .0012462 .0359923 
9 .0403354 .0224149 .0121017 .0063672 .0032733 .0016479 9.268 
10 .0773787 .0455575 .0259736 .0143880 .0077664 .0040953 10.345 
BI . 1350270 .0842470 .0507398 .0296050 .0167867 .0092755 11.403 
12 . 2159820 . 1427959 .0908668 .0558601 .0332844 | .0192792 12.446 
13 .3189191 . 2233093 .1501174 .0972414 .0609022 .0369911 13.476 
14 .4377093 . 3242034 . 2301029 . 1570291 . 1033768 .0658545 14.496 
15 5622907 .4396397 . 3290508 . 2364237 .1635647 | .1092812 15.508 
16 .5603603 | .4414011 .3339183 . 2423237 . 1697551 16.512 
17 . 5585989 -4430169 .3376530 . 2478471 17.512 
18 .5569831 .4445061 .3414943 18.509 
19 . 5554939 .4458843 19.503 
20 .5541157 20.497 





CHIA KUEI TSAO 


TABLE 2 


Values of G(m+1r;1/4, 


. 2500000 0625000 .0156250 .0039063 .0°97656 .024414 061035 
. 5000000 . 1875000 .0625000 .0195313 0058594 0017090 0°48829 
.3750000 1562500 .0585938 .0205078 .0068359 0021973 
6250000 3125000 1367188 0546875 .0205078 0073242 
5000000 2578125 .1181641 0498047 0197144 
4140625 . 2167969 . 1025391 .0449219 
5859375 3486328 . 1845703 0893555 
5000000 . 2958985 1582031 
4291992 2530518 
5708008 3701172 


5000000 


11 


.0*15259 .0°38147 0° 95367 0° 23842 0759605 0714901 
0713733 .0°38147 .0410490 0528610 .0° 77486 0° 20862 
.0°68665 .0°20981 .0°62943 0418597 0°54240 
.0025177 0°83923 0°27275 0486784 0*27120 
.0074310 .0026932 0°94509 0°32282 0°10777 
.0185394 .0072937 0027590 0010099 0°35954 


0°37253 

0°55879 

0515646 0°44703 

0°83447 0525332 

0435271 0*11347 

0°12496 0442535 

0°38396 0713813 
0010467 (°3977 

0025724 0010320 

0057687 0024429 

0119141 0053254 

0228295 0107712 

5617371 3845062 . 2422247 . 1419988 0782261 0408334 0203364 

5000000 3376970 . 2114401 1238995 0685221 0360248 

4445419 . 2969108 1849623 1083569 0601355 

5554581 .3949804 2613325 1621051 0949544 

5000000 3508328 2302568 1423102 

1492277 3115888 . 2030769 

5507723 1030413 2767480 


5000000 3612217 


maid 
.0403290 .0172043 0070076 0027461 0010414 
.0780487 .0360870 0158196 .0066328 0026779 
1363831 0683250 0322275 .0144534 0062106 
2176819 . 1181106 0599318 0287466 .0131447 
. 3203430 1881142 1026592 0526595 025623 
4382629 2781677 . 1631794 0894995 0463398 


4528972 


5471028 


15 16 


.0°93132 .0°23283 .0!°58208 .0'914552 .0'' 36380 .0°290950 
.0714901 0839581 0810477 0°27649 0'°72760 0'°19099 


.0°12666 .0735623 0899525 0827649 0°76398 0°21009 


.0°75996 .0°22561 .0°66356 0719354 0856025 0816107 








_ 


mH IS O&O 


16 


26 


99 


30 


(2.9 


0535958 


.0*14216 


048637 


.0914743 


0%40259 
0010027 
0022995 
0048929 
0097190 
0181145 


.0318170 


0528628 
0833562 
1251192 
1792 »7: 2 

2458392 
3234860 
4094924 
5N00000 


16 


0511243 


.0546745 


0416802 
053470 


0715322 


0340034 
0396305 


0021494 


.0044785 


0087565 


.0161358 
.0281283 
.0465376 


0732888 


.1101526 
. 1583968 


2184292 


. 2895225 


3697044 


-4558715 


5441285 


; Pp, m) 


SUM IN 


TABLE 2 


17 


.0°34738 
.0515150 
.0557067 
.0419018 
.0'57042 
.0915596 
.0939252 


0°91651 


.0019978 
.0040869 


0078801 


.0143748 
.0248895 


0410226 
0645268 


.0970940 


1400669 
1941253 
2590132 
3333819 
4148067 
5000000 


(mu; Pp, 





RANDOM 


SAMPLES 


Continued 


18 19 


0°10619 .0732145 


.0°48468 .0°15325 


.0519089 .0°62977 
0°66467 0522866 
0420820 .0°74651 
.0459431 0422202 
0915613 0*60758 
0738051 .0915422 
.0°86570 0936544 
.0018484 0781269 


0037203 0017037 
0070849 0033798 
0128078 0063654 


.0220418 .0114137 


0362035 0195345 
.0568833 0319841 
0856776 0502015 
. 1239532 0756794 
.1725748 . 1097722 


. 2316468 . 1534631 
.3003318 . 2071267 


3768005 . 2703385 


.4583455 .3417811 
.5416545 4192831 


. 5000000 


“l= 


20 


096461 


0747940 
.0°20519 


.0°77545 


052€337 


0581458 


.0423178 


0*61162 


0315066 
0934828 


075902 
0015654 


.0030654 
.0057157 
.0101733 
.0173244 
.0282834 
.0443499 
.0669093 
.0972791 
. 1365104 
.1851751 
. 2431774 
. 2096323 
. 3828507 
.4604453 
.5395547 


1, (m + 1)/m, 


15 

































997 


16. 
.304 
.401 
-438 
467 
.489 
.504 
.514 


296 


519 


.521 
5.519 
.516 
.510 
.503 
497 


It perhaps should be pointed out that the density function of the sample mean, 
X, is given by 


h(u 


a iy 


3. Normal approximations. Since S is the sum of observations in a random 
sample, then, by the central limit theorem, it is asymptotically normally dis- 


tributed. Let us denote by N(y; u, c) 
standard deviation co. 


a= 


(3.1) 


the normal distribution with mean yu and 
Then, for sufficiently large m and suitable s, the value 
G(s; p, m) can be approximated by — 0, 1) with 


(-nE0)/ValEo- Ee) 


On the other hand, when a suitable size probability a is given, the formula 


(3.1) can also be used to obtain an approximate value of s or of r 
example, consider the case p,; = pe 
approximate value of an integer r is 





= Pr = 


given by 


s — m. For 
/k. When m is large, the 









.0°25600 
.0423040 
.0711520 
.0342240 
.0012672 
0032742 
0075034 
0155520 
.0295680 
0520960 
0857344 
1326336 
1939200 
2691840 
3562240 
.4511488 
.5488512 


. 


.0400000 
. 1200000 
. 2400000 
-4000000 
- 6000000 


.0°51200 
.0551200 
.0*28160 


0711264 


.0°36608 
.0010204 


0025165 
0056038 
0114330 
0215987 
0380908 
0631168 
0987904 


. 1467136 
. 2075392 


. 2806221 


3638656 
4538368 


. 5461632 


Values of G(m +r; 1/5, m) 


3 


.0080000 
.0320000 
.0800000 
. 1600000 
. 2800000 
- 4240000 
. 5760000 


10 


.0°10240 
0511264 
.0°67584 
0129286 
.0°10250 
0330648 
O®S80876 
0019239 
0041880 
0084345 
OLS5SS8486 
0279686 
0466059 
0736621 
1108502 
1593564 
. 2195062 
. 2905155 
. 3704054 
.4561244 


.5438756 


TABLE 3 


m 


4 
.0016000 
.0080000 
.0240000 
-0560000 
. 1120000 
. 1952000 

3040000 
4320000 
. 5680000 


m 
11 


0720480 
0°24576 
.0515974 
0574547 
0*27955 
0489231 
0°25076 
0363418 
0014659 
0031323 
0062407 
0116727 
0206091 
0345039 
0549871 
.OS36886 
1219961 
.1707781 
2301267 
2991816 
3760854 
1581031 
5418969 


.0°32000 
-0019200 
.0067200 
.0179200 
.0403200 
.0790400 
. 1382400 
. 2198400 
. 3222400 


.4390400 
.5609600 


0840960 
0753248 
0°37274 
0518637 
0°74547 
0425297 
0'75399 
0520192 
0949361 

0011145 
0023448 
0046292 
0086242 
0152318 
0256027 
0410923 
0631570 
0931908 


1323153 


. 1811542 
. 2396336 


. 3068550 


3810787 


.4598363 


5401637 


.0°44800 
.0017920 
.0053760 
.0134400 
.0291840 
.0564480 
.0990720 
. 1599360 


2396800 


. 3360640 
.4439680 


5560320 


0°81920 
0711469 


.0'86016 


0°45875 
0519497 
0570083 
0'22078 
062386 
0316074 
0938214 
0384604 
0017568 
0034412 
OO63889 
0112865 


.0190360 


0307416 
0476549 


.0710716 


1021819 


. 1418902 


1906298 


. 2482074 


.3137116 


3855134 
4613710 


5386290 


.0*12800 
0710240 


0*46080 


.0015360 
.0042240 
.0100480 
.0212480 


0407040 


.0716160 


1168640 
1782400 
2557440 
3471360 
4479360 
5520640 


0°16384 
0824576 
0719661 
011141 
0°50135 
0519028 
0563160 
0*18776 
0450831 
0°12687 
0329468 
0764162 
0013174 
0025630 
0047439 
OOS3819 
0141789 


.0230216 


0359581 
0541372 
0787080 
1106857 
1507962 
1993242 
2559905 
3198866 
3894865 
4627423 


$9790L9°9 
OE 257 ‘ 


i IO 





SUM IN RANDOM SAMPLES 709 


TABLE 3—Continued 





m 
r a - —_ 
15 16 17 18 19 20 "1 

0 | .0'°32768 .0''65536 -0'113107 .0'726214 0352429 -0'310486 

l .0°52429 .0°11141 0923593 .0"149807 .0'110486 .0'222021 

2 | .0844565 .0°10027 .0°22413 .0'°49807 .0'°11010 .01124222 

3 .0726739 0°63504 .0°14942 .0°34865 .0°°80740 .0'°18570 

$ .0°12701 .0731752 .0°78447 .0819176 .0°46426 .0°11142 

5 .0°50754 0°13325 .0734494 .0888162 .0822274 .0°55690 

6 | .0517703 048720 | .0°13191 .0735194 | .0992652 | .0824097 

7 | .0°55215 -0°15902 .0°44984 .0°12512 .0734279 .0°92632 

8 .0*15666 .0547180 .0513922 .0°40327 .0°11486 0732219 

9 .0440939 .0*12880 .0539617 .0511941 .0°35330 .0°10279 

10 099494 0432678 .0*10471 .0532814 .0510078 .0°30391 

11 .0°22655 .0°77634 .0*25901 .0584351 0526878 0°83961 

12 .0748628 .0317379 0160340 .0420413 0°67454 .0°21818 

13 | .0°98871 | .0936840 0713308 | .0°46750 | .0*16015 | .0°53618 

14 0019121 .0°74263 .0°27903 0710176 .0436128 .0412517 

15 0035296 0014286 0955822 .0821131 0°77734 .0*27862 

16 | .0062370  .0026307 | .0010688 | .0°41990 | .0°16003 | .0°59330 

17 | .0105778 | .0046490 | .0019639 | .0°80061 | .0°31607 0712119 

Is 0172569 0079027 .0034708 .0014681 0960035 0923806 

19 0271355 .0129474 .0059117 .0025947 -0010989 0945062 
20 .0412008 .0204816 .0097223 .0044275 0019422 | .0°82351 
21 0605021 .0313346 .0154631 .0073063 .0033195 .0014554 21.172 
22 | .0860570 | .0464305 | .0238194 | .0116774 | .0054948 | .0024911 22.240 
23 | .1187303 | .0667251 0355838 | .0181000 | .0088212 | .0041353 23.298 
24 . 1591009 .0931182 .0516180 .0272413 .0137505 .0066656 24.348 
25 2073366 . 1263454 0727908 .0398550 .0208360 0104444 25.390 
26 2630968 1668643 .0998967 .0567417 | .0307231 | .0159251 26 .426 
27 3254860 2147501 1335606 .0786900 | .0441256 | .0236513 27 .455 
28 | .3930727 2696186 | .1741393 1064011 | .0617858 | .0342442 28 .478 
29 1639773 3305939 2216345 . 1404044 .0844184 .0483778 29.496 
30 | .5360227 3963311 2756316 | .1809725 | .1126419 | .0667390 30.510 
31 .4650972 . 3352782 . 2280488 . 1469034 .0899750 31.519 
32 5349028 3993087 .2811985 | .1874051 . 1186309 32.525 
33 4661188 .3395944 2340437 . 1530825 33.528 
34 5338812 .4020437 | .2863716 193473 34.528 
35 .4670556 . 3435882 . 2396624 35.526 
36 .5329444 | .4045673 . 2911952 36.522 
37 .4679189 | .3472979 37.517 
38 .5320811 .4069054 38.510 
39 .4687176 39.504 
10 .5312824 40.496 

(3.2) r* = mk — 1)/2 + zav/m(k? — 1)/12, 
where z,. is so determined that N(z. ; 0, 1) = a, assuming, of course, that the 


value a = G(m + r; 1/k, m) is some moderate size. Thus, in the last column of 
each of the Tables 1, 2, 3, and 4, r* is calculated for m = 20 and for each integer 





CHIA KUETI TSAO 


TABLE 4 
Values of G(m + r; 1/6, m) 


4 5 6 7 
| .1666667 .0277778 .0046296 .0°77161 .0°12860 021434 | .0535723 
| .3333333 | .0833333 .0185185 .0038580 .0°77161 .0715004 | .028578 
| .5000000 | .1666667 .0462963 .0115741 .0027006 .0760014 | .0712860 
.2777778 .0925926 .0270062 .0072017 .0018004 .0742867 

4166667 . 1620371 .0540124 .0162037 .0045010 .0011788 

5833333 . 2592593 .0972222 .0324074 .0099023 .0028292 

. 3750000 . 1589506 .0587706 -0196759 .0061050 

. 5000000 . 2391975 .0979939 .0358796 .0120599 

. 3356482 . 1520062 .0607639 .0220872 

.4436728 . 2214507 .0964720 .0378658 

. 5563272 . 3051698 . 1446331 | .0612211 

.3996914 .2058470 | .0938786 

. 5000000 .2793853 | .1371635 

. 3631044 .1917010 

.4535751 . 2571695 

.5464249 | .3321616 

.4142054 

- 5000000 


10 12 13 14 


O° 


| 


| 


-0°59537 .0799229 | .0716538 0827564 .0°45939 -0°°76566 .0°12761 
.0°53584 -0°99229 -0°18192 0733076 -0°59721 .0°10719 .0°19141 
0426792 .0°54576 .0510915 -0°21500 -0741805 .0°80394 | .0°15313 
.0°98237 .0421830 | .0°47299 -0°10033 -0°20902 .0°42877 | .0°86774 
0829471 .0*70949 -0°16555 .0°37624 -0°83610 .0°18223 | .0739048 
.0°76625 0719866 .0*49664 -0*12040 .0528427 .0°65602 | .0°14838 
| .0017832 0949575 .0713227 .0434082 -0°85227 -0520764 .0°49444 
| .0037884 .0011263 0931982 0187355 023076 -0559214 .0° 14812 
-0074481 -0023631 | .0°71276 0320597 -057368 -0*15476 .0°40591 
-0136877 -0046280 -0014805 -0°45192 -0°13252 0137528 .0*10307 
-0236947 .0085280 | .0028900 -0°93083 -028703 .0'85227 | .0'24481 
-0388696 .0148786 | .0053366 -0018120 -0°58702 -0°18259 -0°54803 
| .0607127 .0247002 | .0093707 | .0033517 -0011400 -0°37117 0711632 
.0906529 .0391776 -0157126 | .0059172 -0021116 -0°71925 | .0°23521 
. 1298333 -0595751 .0252479 -0100064 -0037450 -0013339 0745494 
. 1788826 -0871076 -0389945 -0162587 -0063795 | .0023750 0984452 
| .2377133 . 1227774 -0580361 .0254498 -0104660 -0040715 | .0015089 
. 3054002 .1671991 -0834228 -0384641 .0165747 .0067361 .0026012 
. 3801720 . 2204424 . 1160465 -0562434 -0253899 -0107778 | .0043356 
-4595282 .2819216 | .1565039 .0797086 -0376885 .0167073 | .0070003 
.5404718 -3503613 . 2049682 - 1096620 .0542978 .0251327 | .0109667 
-4238522 - 2610923 . 1466783 .0760344 -0367419 -0166945 
. 5000000 - 3239628 . 1909994 . 1036260 .0522685 | .0247274 
-3921209 | .2424491 | .1376235 -0724434 | .0356786 
- 4636536 .3003836 | .1783135 .0979306 .0502041 
. 5363464 . 3636907 . 2256439 1292565 -0689623 
-4308425 - 2791739 . 1667359 .0925628 
. 5000000 - 3380609 . 2104070 . 1215070 
.4010872 | .2599838 . 1561262 
-4667306 .3148350 . 1965248 
. 5332694 - 3739943 . 2425354 
-4362058 - 2936936 
. 5000000 . 3492377 
-4081355 
- 4691392 
- 5308608 


1 
2 
3 
4 | 
5 
6 
7 
8 
9 











SUM IN RANDOM SAMPLES 711 
TABLE 4—Continued 
m 
15 16 17 18 19 20 - 
0 .01'21268 .0'935447 -01359078 .0'*98464 .0'416411 .0'527351 
] . 0934029 .0"'60260 .0'110634 .0!218708 .01332821 .0'457437 
2 | .0°28925 .0°°54234 .0°10102 .0"118708 -0'934462 | .0'°63181 
3 | .0°17355 .0°34348 .0°°67349 .0"°13096 .0125272 | .01248439 
4 .0°82436 .0°17174 .0°35358 .0°72026 .0%14532 .0'129063 
5 .0732974 .0872131 .0°15558 .0°33132 . 069752 014532 | 
6 0°11538 .0726443 .0°59628 .0°13251 .0°29060 .0°§2965 
7 .0°36221 .0786805 .0720429 | .0°47298 .0°10789 .0°24277 
s .0510385 .0°25984 .0763726 .0715349 .0°36367 .0°84884 
9 .0°27548 .0°71868 .0°18345 | .0745913 .0711287 .0°27295 
10 | .0568284 .0518554 .0°49241 .0°12792 | .0732595 081595 
ll .0415938 -0545062 .0512422 .0°33469 088322 .0'22868 | 
12 .0435242 .0*10361 .0529646 | .0°82779 .0°22607 .0760498 | 
13 .0°74196 .0422666 -.0567278 | .0519457 | .0%54956 | .0°15191 | 


14 | .0914934 -0°47383 -014582 -0°43653 | .0°12746 | .0°36373 | 
15 | .0°28838 .0°94987 0130293 | .0593836 0528308 -0°83366 | 
16 | .0°53578 -0°18315 -0°60508 -0*19386 | .0°60406 -0°18350 | 
17 | .0°96017 -0°34052 -0711651 .0'38598 | .0°12418 | .0538899 | 
18 | .0016634 | .0°61188 .0321676 | .0'74231 | .0'24652 | .0579608 | 
19 | .0027909 .0010647 -0939043 | .0713818 | .0*47357 -0°15762 

20 | .0045429 -0017969 0368202 024942 .0*88200 -0*30250 

21 -0071845 .0029463 .0011572 | .0°43724 0915951 -0'56363 | 
22 | .0110543 | .0046994 .0019099 | .0°74548 - 0828052 .0310211 











| 

23 | .0165672 | .0073005 | .0030699 | .0012377 | .048038 | .0518010 
24 | .0242119 | .0110586 | .0048111 | .0020034 | .0°80191 | .0#30965 
25 | .0345389 | .0163500 | .0073586 | .0031648 | .0013064 | .0%51948 
26 | .0481383 | .0236159 | .0109951 | .0048836 | .0020787 | .0°85123 
27 | .0656070 | .0333529 | .0160629 | .0073679 | .0032338 | .0013636 27.111 
§ | .0875068 | .0460950 | .0229622 | .0108766 .0049224 | .0021373 28.177 

| -1143145 | .0623862 | .0321431 | .0157221 | .0073364 | .0032799 29.238 
30 | .1463697 | .0827459 | .0440910 | .0222688 | .0107140 | .0049319 | 30.291 
31 | .1838250 | .1076269 | .0593038 | .0309263 | .0153409 | .0072710 | 31.337 
32 | .2266041 | .1373712 | .0782631 | .0421376 | .0215499 | .0105162 | 32.377 
33 | .2743755 | .1721659 | .1013986 | .0563602 | .0297153 | .0149298 | 33.411 
34 | .3265444 | .2120055 | .1290503 | .0740416 | .0402428 | .0208166 | 34.440 
35 | .3822670 | .2566643 | .1614300 | .0955894 | .0535540 | .0285195 35.466 
36 | .4404886 | .3056841 | .1985863 | .1213386 | .0700656 | .0384114 36.486 
37 | .5000000 | .3583799 | .2403785 | .1515178 | .0901642 | .0508820 37.502 
38 .4138645 | .2864607 | .1862174 | .114177 0663199 38.515 
39 | 4710907 | .3362815 | .2253645 | .1423449 | .0850915 39.524 
40 | _5289093 | .3890988 | .2687060 | .1747889 | .1075159 40.531 
41 | 4440107 | .3158027 | .2114912 | .1338387 | 41.534 
42 .5000000 | .3660384 | .2522745 .1642069 | 42.536 
43 | | .4186402 | .2967949 | .1986453 43.535 
44 | .4727136 | .3445442 | .2370398 44.533 
45 .5272864 | .3948645 | .2791264 | 45.529 
46 4469735 | .8244905 | 46.524 
47 .5000000 | .3725753 47.517 
18 .4227006 48.511 
49 | .4740907 49.504 
50 | | 5259093 50.496 








r for which G(m + r; p,m) = a is not too small. For instance, in the case k = 3 
and r = 14 (a = 0.0658545), we find r* = 14.496. 

It may be concluded from the tables that at least for the cases k = 3, 4, 5, 6, 
m = 20, and integers r satisfying the following inequality 





712 CHIA KUEI TSAO 


(3.3) m(k — 1)/2—- 2 m(k2 — 1)/1l2 srs mk — 1)/2+ 2V mi? — 1)/12 


“>, 


the values G(m + r; 1/k, m) are well approximated by 
(3.4) N(m + r + 1/2; m(k + 1)/2, /m(k — 1)/12.) 


4. Applications. The distribution of the statistic S has many direct applications. 
There are many situations in which the parent population has the distribution 
(2.1) and the distribution of the sum or of the mean of a random sample is needed. 
As a simple example, suppose each X; is the number scored with a throw of a 
perfect die, then the probability distribution function of the sum of the numbers 
X,, X2, +--+, Xm is readily seen to be g(s; 1/6, m). 

Besides direct applications, the statistic S, as pointed out previously, is the 
rank sum criterion in the goodness of fit test described in [1]. The function 
G(s; 1/k, m) provides levels of significance and some of the more general cdf’s 
G(s; p, m) serve as power functions of the rank sum tests. For the detailed pro- 
cedure of the test, the reader is referred to [1]. 

As another application, the distributions of S are useful in calculating the levels 
of significance and the power of the sequential rank sum tests. The detailed pro- 
cedure is discussed in [2]. 

Finally, it might be worth mentioning that a special case of (2.4) can be used 
as a generating function in the partition and permutation problems. In fact, the 
function k”D(t; 1/k, m) generates the number of partitions of an integer s into 
an array of m integers (i.e., the number of representations of an integer s as the 
sum of m integers, assuming that different permutations of the same set of m 
integers are considered as different representations) which are greater than or 
equal to 1 and less than or equal to k. Thus, for the cases 1 S m S 20 and 
3 = k Ss 6, such a number can be found by the use of the entries in the four 
tables. 


5. Relation of this paper to a result of Whitfield. It is pointed out by a referee 
that a portion of the tables in this paper has been calculated by Whitfield [4], 
who gives G(s; 1/k, m) to five decimal places for 3 < k S 8 and2 S m SB. 
In comparing the present tables with Whitfield’s, the author noticed that several 
entries are in disagreement. 


REFERENCES 


1. Cata Kuve Tsao, ‘‘Rank sum tests of fit,’? Ann. Math. Stat., Vol. 26 (1955), pp. 94-104. 

2. Cuta Kuer Tsao, ‘‘Sequential rank sum tests,” submitted for publication in the Ann. 
Math. Stat. 

3. Guide to Tables in the Theory of Numbers, Bulletin of the National Research Council, 
No. 105, 1941, Washington, D. C. 

4. J. W. Wuitrietp, ‘The distribution of total rank value for one particular object in 
m rankings of n objects,’’ Brit. J. Stat. Psychol., Vol. 6 (1953), pp. 35-40 





ON THE DISTRIBUTION OF THE NUMBER OF SUCCESSES 
IN INDEPENDENT TRIALS' 


By Wasstty HorErrrpiIne 


University of North Carolina 


1. Introduction and summary. Let S be the number of successes in n inde- 
pendent trials, and let p; denote the probability of success in the jth trial, 7 = 1, 
2, --- , n (Poisson trials). We consider the problem of finding the maximum and 
the minimum of Eg(S), the expected value of a given real-valued function of S, 
when ES = np is fixed. It is well known that the maximum of the variance of S 
is attained when p; = p. = --- = pa = p. This can be interpreted as showing 
that the variability in the number of successes is highest when the successes are 
equally probable (Bernoulli trials). This interpretation is further supported by 
the following two theorems, proved in this paper. If b and c are two integers, 
0b Ss np Sc Sn, the probability P(b < S S c) attains its minimum if and 
only if p; = po = --+ = Pa = p, unlessb = Oandc = n (Theorem 5, a corollary 
of Theorem 4, which gives the maximum and the minimum of P(S & c)). If 
g is a strictly convex function, Zg(S) attains its maximum if and only if 
Di = Po = --+ = pn = p (Theorem 3). These results are obtained with the help 
of two theorems concerning the extrema of the expected value of an arbitrary 
function g(S) under the condition ES = np. Theorem 1 gives necessary conditions 
for the maximum and the minimum of Eg(S). Theorem 2 gives a partial char- 
acterization of the set of points at which an extremum is attained. Corollary 2.1 
states that the maximum and the minimum are attained when p, , po, --- , Dn 
take on, at most, three different values, only one of which is distinct from 0 and 1. 
Applications of Theorems 3 and 5 to problems of estimation and testing are 
pointed out in Section 5. 


2. The extrema of the expected value of an arbitrary function of S. The ex- 
pected value of a function g(S) is 


(1) f(p) = Eg(S) = 2. g(k)An(p), 


k=O 


where p = (pi, Po, --* , Pn) and A,,(p), the probability of S = k, is given by 


Au(p) = pe II pita — pp), =& =0,1,---,n. 


ig O,Lijmel,-++,m jel 


Erte sping 


The function f(p) is symmetric in the components of p and linear in each 
component. We observe in passing that, conversely, any function of p with 
these two properties can be represented in the form (1). The problem to be con- 


teceived April 15, 1955. 
1 This research was supported, in part, by the United States Air Force through the Office 
of Scientific Research of the Air Research and Development Command. 


713 





714 WASSILY HOEFFDING 
sidered is to find the maximum and the minimum of f(p) in the section D of the 
hyperplane 
Pi + Po t--+ + Da = np (0 < p < 1), 
which is contained in the closed hypercube 
8S; & I, j3=1,2,---,n. 


We shall denote by p’'*"’’'’" the point in the (n — m)-dimensional space, 
which is obtained from p by omitting the coordinates p;, , pi, , --- , p:,, . Since 
f(p) is symmetric, and linear in each component, we can write 
(2) f(p) = frro(p’) + pifn-ra(p’), j= 1,2,---,n, 


;= 


where the functions f,_1,9 and f,_:,1 are independent of the index j and symmetric 
and linear in the components of p’. In general, we define the functions f,_,,; by 
fao(p) = f(p), and 


OO inv”) = forte" + nah oll 
s=60,1,--- kk 


Applying (3) repeatedly, we obtain 


(4) S(p) = Do Ci(pr , Poy °** 5 Dm) Sn—m.i(p?””'”), 


where Co, Cmi,***, Cm.m are the symmetric sums 
C mo(pr ’ Pe eT Dm) = a 
» Pa) 
= (pip2 -+* pi) + (Pi +++ Di-rDigs) +++ + (DPm-i4sPm-i4n +++ Pm), t > O. 


If we write (01°) for the point whose first u coordinates are 0 and the remain- 
ing v coordinates are 1, and let (p:, po, --: , Pm) = (0" 1"), h = 0,1, --- , m, 
we obtain from (4) a system of linear equations for fa_m,(p 
tion is 


‘™\ whose solu- 


6 lade?) «> (—1)" (;) sor’, Sasas*** o Beds 


han) 


1 0.1, ++= ,@. 


THEeoreM 1. Leta = (a, , d2,--+ , Gn) be a point in D at which f(p) attains its 
maximum. Then for every two distinct indices i, 7, we have 


(7) fno.2(a”) < if a; ¥ a;, 
(8) fn_22(a”’) = ifa;~a;,0 <a; < 1,0 <a; <1, 


(9) fies > f0 <a; = a; < 1. 





INDEPENDENT TRIALS 


The inequalities (7) and (9) are strict if the maximum is not attained at the points 
in D which differ from a only in that a; and a; are replaced by a; + x and a; — x 
with |x| positive and arbitrarily small. 

Proor. Let a’ denote the point which is obtained from a if a; and a; are re- 
placed by a; + x anda; — zx. The point a’ is in D for all z in the interval J defined 
by0 Sa;+ 225 1,0 S a; — x S 1. By (4) we have 


f(a’) = fa-oo(a’’) + (a; + a;)fnorla’’) + (a; + 2) (a; — 2)fn—22(a"’). 
Hence, 
(10) f(a’) — f(a) = x(a; — ay — 2)fn_o2(a”’). 


Since f(a) is a maximum, the right side of (10) must be negative or zero for 
all z in J. We may assume that a, S a;. If a; < a; , and z is positive and suffi- 
ciently small, then x is in J. Hence, (7) must hold. If0 < a; < landO < a; < 1, 
then the point x = 0 is in the interior of J. Hence, (8) and (9) must hold. 

If the maximum is not attained at a’ when z is in J and is different from and 
sufficiently close to zero, the inequalities (7) and (9) must be strict. The proof 
is complete. 

The following explicit expressions for f,-2,2(a'”) will be useful in the applications 
of Theorem 1. It is easily seen (for instance, from probability considerations) 
that 


An(0" "1", ps, -++ , Pn) = Anoe-n(Ps , °° 5 Dn); 


Hence, from (6) and (1), 


(11) fa-ee(a) = Dd glk) {Anse2(a’) — 2An—ons(a”) + An—on(a’’) 


k=O 


Alternatively, this can be written in the forms 


n—l 


(12) faoa(a’”) = DU {glk + 1) — glk)}{Anorala”’) — Anu(a”)} 


k==O) 


n 


(13) faoola”) = D> {g(k + 2) — 2g(k + 1) + glk) }Anonla” 
kao 

In general, the maximum or the minimum of f(p) can be attained at more 
than one point in D. Thus, if np < n — 1, the function pip. --- p, attains its 
minimum 0 at every point in D with at least one zero coordinate, and there are 
infinitely many points with this property. The following theorem gives some in- 
formation about the set of points at which an extremum is attained. 

TxeEorem 2. Let a be a point in D at which f(p) attains its maximum or its mini- 


mum. Suppose that a has at least two unequal coordinates which are distinct from 
0 and 1. Then, 





716 WASSILY HOEFFDING 


(i) f(p) attains its maximum (or minimum) at any point in D which has the same 
number of zero coordinates and the same number of unit coordinates as a has; 


(ii) af a has exactly r zero coordinates and s unit coordinates, the maximum (or 
minimum) of f(p)is equal to 


(14) f(a) = (1 — np + 8)g(s) + (np — s)g(s + 1), 


and we have 


(15) g(s + k) = kg(s + 1) — (k — 1)g(s), k= 2,--- n—r-—s. 


, 


Proor. Letm = n — r — s bethe numberof coordinates of a = (a; , dz, «++ , Gn) 
which are distinct from 0 and 1. We may take a; , a2, --- , dm to be these co- 
ordinates, and we may assume that a, ~ a. We first show that 


(16) Sn—ei(Qey1,*** » On) = 


for k = 2,---,m. 


’ 
Equations (16) will be proved by induction on k. That (16) is true for k = 2 
follows from Theorem 1, (8). Assume that (16) is true for a fixed k, 2 < k < m. 
Let 


(17) a ee ee a is ees ey 

where 

(18) b+t-:- th =at-:- +a, 0<b 51,1 
The point b; is in D. By (4) and the induction hypothesis, 


(19) f(b:) = Fn—,o(Ge4s aoe 7) + (ay + oo + Ox) fn—ea(Ge4s 9 °°*% 54a) = f(a). 


Thus, the maximum is attained at every point b, which satisfies (17) and (18). 
In “owe (18) can be satisfied with b; ¥ be, by # dei, be F Qe41,0 << dO; < 1, 
i= 1,2,---,k (since 0 < a4, < 1). Under these assumptions, we can apply 
the etait, hypothesis (16) with a replaced by the point b; , whose first 


k + 1 coordinates can be suitably rearranged. Hence, 
Fn—e,i(br , Ansa, **° Sn—zs(be , Qe42,°** , An) = O, 
Applying (3) to the left sides of these equations, we obtain 
Fn—e—1,i(On42 ors Gn) + bifn—z- 1,441(On42 ,ae* 
h = 1, 2. 


Since b; + be , we find that (16) is satisfied with k replaced by k + 1. Thus, 
(16) holds for k = 2, --- , m. 

By (16), with k = m, equations (19) hold with k = m for every b,, which 
satisfies (17) and (18). Since f is symmetric, this implies part (i) of the theorem. 

To prove part (ii), we observe that 


@ + a2 +--+ + an = np — 8, 





INDEPENDENT TRIALS 


and we can put (@m4i,°** ,@n) = (0'1'). Hence, by (19) and (16), with k - 
(20) f(a) = fn—mo(O'l’) + (np — 8)fa—mr(0'1') 


and 
(21) fr—ms(O'1") = 0, 
Applying (6) and then (1), we obtain 


Sn—m.i(0'1") = a (2) (7) s0" mot) 


h=0 


‘ ai 
- ¥ (-1* (5) ob th), t=0,1, 
hea=O h 

Hence, (14) follows from (20). Equations (21) state that the second differences 
of g are zero in the indicated range. Therefore, the first differences, g(s + k) — 
g(s + k — 1), are constant for k = 1, 2,--- , m, which is equivalent to (15). 
The proof is complete. 

The following immediate corollary of Theorem 2(i) is often convenient for 
finding an extremum. 

Coro.uary 2.1. The maximum and the minimum of f(p) in D are attained at 
points whose coordinates take on, at most, three different values, only one of which 
is distinct from 0 and 1. 

Thus, to find an extremum, it is sufficient to determine the numbers r and s of 
the zero and unit coordinates of an extremal point whose remaining coordinates 
are all equal. We shall see that r and s can sometimes be determined with the 
help of Theorem 1. If an extremum is attained at only one point (except perhaps 
for permutations of the coordinates), part (ii) of Theorem 2 will prove useful to 
establish the uniqueness. 


3. The maximum of the expected value of a convex function of S. 
TuHeoreM 3. Jf ES = np and 


then 


(23) Eq(S) = 2 g(k) (i) ea — pp)”; 


where the sign of equality holds if and only if p, = po = +++ = Pa = P. 

Thus, in particular, every absolute moment of S, E(|S — b|°), about an arbi- 
trary point b, which is of order c > 1, attains its maximum if and only if all 
of the p; are equal. 

Proor oF THEOREM 3. Let a = (a;, a@2,°** , Gn) be a point in D at which 
f(p) = Eg(S) attains its maximum. Suppose that a; ~ a; for some 7, j. By 
Theorem 1, f,-2.2(a”’) < 0. By (13) and (22), this implies A,_2,(a”’) = 0, k = 0, 
1, ---,n— 2. But this is impossible, since the sum of the probabilities A,_20 , 





718 WASSILY HOEFFDING 


Anoi1, +++, An-on-e is 1. Hence, the maximum is attained if and only if all 
the a; are equal, i.e., if a, = a, = --- = a, = p. This implies (23) and completes 
the proof. 

Observe that in the proof of Theorem 3, no use was made of Theorem 2. 
Only inequality (7) of Theorem 1 was needed. 

4. The extrema of certain probabilities. In this section we consider the determi- 
nation of the maxima and the minima of the probabilities P(S < c) and 
P(b = S S c) when ES = np. 

TuHeoreM 4. Jf ES = np, and c is an integer, 


(24) 0<P(Sso > (t) oa — p)’ fOscsnp-—-l, 


0 
(25) 0<1— Wn-—-c-1,1 P(S Sc) S Qc, p) <1 
ifnp—-1<c < np, 


fnpscsn 


where 


> 3=~Fs j " 
Qc, p) = max ae | )a (1 — a) 


<s<c k=l 


np — 8s 


ns = s 


(28) a= 


The maximizing value of 8 satisfies the inequality 
(29) (c+ 1 — np)(n — 8) <n — np 


unless c = n — 1, in which case s = n — 1. 
All bounds are attained. The wpper bound for 0 S c S np — 1 and the lower 


bound for np S c < nare attained only if p, = Po = ++: = Pn = P. 
Tuerorem 5. If ES = np, and b and c are two integers such that 


O0OsbeEnpsces 


then 


(30) > (" ) p (1 p)” <= Pb 


kanb vy 


Both bounds are attained. The lower bound is attained only if px = pe 
Pn = p unless b = Oandc = n. 

Proor or THEOREM 4. We first consider the maximum of f(p) = P(S S c| 
in D. By Corollary 2.1, the maximum is attained at a point a = (0’a” 
(using a notation similar to that employed in Section 2), where r = 0, 
n—r—s20O,and 


m—f— ee np — 8. 


P) 
aoe ” 
0 


’ 





INDEPENDENT TRIALS 719 


If c 2 np, let s be the greatest integer contained in np, andr = n — s — 1. 
Then a = np — sand P(S S c|a) = 1. Hence, the (obvious) upper bound in 
(26) is attained. 

Now let 0 S c < np. If s > c, P(S S cl] a) = 0. But 

P(S Sc|p,p,--:,p)>0 


for all c = 0. Hence, we must have s S c. Since a S 1, we have n — r 2 np. 
Ifn — r = np, thena = (0'1"”") andn — r > c¢, hence P(S S c|a) = 0. Thus, 
we must have n — r > np. Consequently, we have the inequalities 


CSB see <n <4 — 73 a, 


and this implies 0 < a < 1. 
We have P(S s c) = Eg(S), where g(k) = 1 or 0, according as k < ¢ or 
k > c. Hence, by (12), 


fn—2,2(a"’) = A, »-(a’’) — An-_o. (a), 
Ifa = (0a”"” 1’), a” is of the form 
a”? Ke ey ' 


Then, 


ij n—-uUu—-vU—. k—v/ ‘ 
Aa-o4(a’) = ( j )a (1 — a) 


(n — wu —v — 2)! satpa me 
23(a°) = ———___—— , ane — @ ‘da — a) 
=" (c—v)!(n—-e-—u-—])! 
-{(n—-u-—v— la—c+t+pr}. 


Since 0 < a < 1, we see that ifv S c S n — u — 1, fr_22(a’’) has the same 
sign as 


(n—-u—-v—lja—c+v. 


Suppose that r > 0. By Theorem 1, with a; = 0, a; = a, we must have 
fa-22(0" a" "'1') S 0. Hence, (n — r — sla —c+s = np—c S 0. But 
this contradicts the assumption. Thus, r = 0, a = (a” “1"), (n — s)a = np — s, 
OSs. 

Suppose that s > 0. By Theorem 1, with a; = a, a; = 1, we must have 
fa-esla” 1’) <= 0, iz., 


(31) (n—-sja-—-c+s—l=np-—c-1s80. 


Hence, if c < np — 1, we must have r = s = 0, a = p. Thus, the second in- 
equality (24) holds for 0 < c < np — 1, and the bound is attained. (We postpone 
the proof for c = np — 1.) 

Now suppose that n — s > 1. By Theorem 1, with a; = a; = a, we must 





720 WASSILY HOEFFDING 


9 


have f,-22s(a" ““1°) = 0, i.e., (n — s — lha — c+ 8 2 O. This is equivalent 


to 


(32) (c+ 1 — np)(n — s) Sn — np. 


If c = n — 1, this contradicts the assumption n — s > 1, and we must have 
s=n-—1.Ifc # n — 1, we havec < n — 1 and n — s > 1,80 that (32) 
must be satisfied. Hence, if c < np, the maximum of P(S S c) is Q(c, p), as 
defined in (27) and (28), and the maximizing value of s satisfies (32) and is 
equal to n — 1 if c = n — 1. (We postpone the proof of strict inequality in 
(32) for c ~ n — 1.) Since a > Oandc — s < n — 38, we have Q(c, p) < 1. 

We next show that if 0 S c < np, the maximum can be attained only at a 
point whose coordinates which are distinct from 0 and 1, are all equal. Suppose 
the maximum is attained at a point a which has at least two unequal coordinates 
which are distinct from 0 and 1. Let s be the number of unit coordinates in a. 
By Theorem 2, equation (14), we must have f(a) = lifs <c,f(a)=1—np+s 
if s = c, and f(a) = Oif s > c. Since for 0 < c < np the maximum is positive 
and less than 1, we must have s = c. By (15), with s = c, k = 2, we must then 
have g(c + 2) = —1, which is not true. Hence, the coordinates of a which are 
not 0 or 1 must be all equal. 

By Theorem 1, this implies that the inequalities in (31) and (32) are strict. 
All statements of Theorem 4 concerning the upper bounds are now easily seen 
to be true. 


The statements concerning the lower bounds follow from the equation 
P(S Sc\|p) =1— P(S Sn —c —1]|q), 
where q = (1 — m1, 1 — ge,---,1 pn). The proof is complete. 


Proor or THEorom 5. Since P(b S S S c) = P(S Sc) — P(S S b— 1), 


the Jower bound in (30) and the condition for its attainment follow from 
Theorem 4. The upper bound 1 is attained at (0"‘a* °1°), where (c-b)a = np — b. 


5. Statistical applications. The lower bound for P(b S S S c), which is given 
in Theorem 5, shows that the usual (one-sided and two-sided) tests for the con- 
stant probability of “success” in n independent (Bernoulli) trials can be used as 
tests for the average probability p of success when the probability of success 
varies from trial to trial. That is to say, the significance level of these tests 
(which is understood as the upper bound for the probability of an error of the 
first kind) remains unchanged. Moreover, we can obtain lower bounds for the 
power of these tests when the alternative is not too close to the hypothesis which 
is being tested. (Very roughly, the significance level has to be less than 4 and the 
power greater than 3.) We can also obtain a confidence interval for p with a 
prescribed (sufficiently high) confidence coefficient and an upper bound for the 
probability that the confidence interval covers a wrong value of p when the 
latter is not too close to the true value. Details are left to the reader. 

Theorem 3 can be applied in certain point-estimation problems. Suppose we 








INDEPENDENT TRIALS 


~] 
bo 


want to estimate a function @(p), and the loss due to saying @(p) = tis W(p, 2). 
If the estimator ¢(S) is a function of S only and if W(p, t(S)) is a strictly convex 
function of S for every p, then Theorem 3 implies that the risk, EW(p, t(S)), is 
maximized when all the p; are equal. It follows, in particular, that if ¢(S) is a 
minimax estimator under the assumption that the p; are all equal, it retains 
this property when the assumption is not satisfied (with no restriction on the 
class of estimators). 

One may doubt whether these problems are statistically meaningful, since the 
average probability of success depends on the sample size. The main interest of 
these results to the practicing statistician seems to be in cases where he assumes 
that the probability of success is constant, but there is the possibility that this 
assumption is violated. 





VARIANCES OF VARIANCE COMPONENTS: I. BALANCED DESIGNS! 
By JoHun W. TuKry 
Princeton University 


1. Summary. Analyses of variance are sometimes intended to reveal informa- 
tion about means (when tests of significance and, better, confidence procedures 
are appropriate). At other times analyses of variance have the purpose indicated 
by their name: to estimate the sizes of the various components contributed to 
the over-all variance from the corresponding sources. If we make certain as- 
sumptions of independence and normality for all of the quantities involved, it 
is easy to obtain formulas for the variances of the natural estimates of these 
variance components. The utility of these estimates can be called in question 
on the grounds of three sorts of assumptions: of certain amounts of independence, 
of infinite populations, of normality of distribution. This paper treats of the 
case where the latter two of these assumptions are removed, leaving only the 
customary (and dangerous) independence assumptions (as do the next two 
papers in this series). 

The treatment makes intensive use of polykays (which were introduced in 
[1], although that name was not used, and discussed in [2]) and is applied spe- 
cifically to balanced single and double classifications, to Latin squares, and to 
balanced incomplete blocks. A general definition of balance for an analysis of 
variance situation is given, and the general application of the technique to 
balanced situations is set forth. An application to a less simple example of a 
balanced single classification concludes the paper. 


2. Introduction to polykays. In order to deal easily and effectively with 
problems involving random samples from finite populations, the writer empha- 
sized in [1] certain homogeneous polynomial symmetric functions of a finite set 
of numbers. These are of two sorts: (i) the brackets or symmetric means, ex- 
emplified by 


(12) = 


where the summation is over the n(n — 1) pairs (7, 7), with ¢ # 7; and (ii) the 
parentheses or polykays, exemplified by 


(12) ne = kike — 1 ks = (1)(2) — (111 


n 


Each set can be expressed linearly in terms of the other with constant coeffi- 
cients. (Elsewhere, [1], [2], we use (12) as an alternate to ky , but in the present 

Received April 6, 1955. 

! Prepared in connection with research sponsored by the Office of Naval Research and 
based on part of Memorandum Report 45, ‘‘Finite Sampling Simplified,’ Statistical Re 
search Group, Princeton University, which was written while the author was a Fellow of 
the John Simon Guggenhein Foundation. 





VARIANCE COMPONENTS I 


paper it will be simpler not to use this alternative notation.) The facts about 
polykays which we shall need are the following: 

(A) Any polynomial symmetric function of a finite set of numbers can be 
expressed linearly in the polykays of that set. 

(B) The average, over all random samples drawn from a finite population 
of numbers, of any polynomial in the values of the sample can be expressed 
linearly in the polykays of the population with coefficients which do not involve 
the size of the population. 

(C) If adding a constant to all the numbers of the set in (A) or the finite 
population in (B) leaves the polynomial invariant, then the coefficients of all 
polykays with one or more subscripts ‘‘1”’ vanish. 

(D) Any polynomial function of several finite sets of numbers which is sym- 
metric in each of the sets separately can be expressed in terms of products of 
polykays from the various sets, the polykays of each set entering, at most, 
linearly. 

(E) The average, over all sets of random samples from the respective finite 
populations of numbers, of any polynomial in the values of these samples can 
be expressed in terms of products of polykays from the various sets, the poly- 
kays of each set entering at most linearly, with coefficients which do not in- 
volve the sizes of the populations. 

(F) If adding a constant to all the numbers, of one set in (D) or of one finite 
population in (E), leaves the polynomial invariant, then the coefficients of all 
products involving a polykay of the corresponding set, or population, which 
has one or more subscripts ‘1’’ vanish. 

(G) The following formula holds: 


9 


- | 
[ke)” = Keo + a ka + 


] 


7 


where n is the size of the set, or population, for which ky , hr. , ky are some of 
the polykays. 

(H) For a set made up of n — 1 zeros and one (nonzero) value, ¢, all brackets 
and polykays with more than one index vanish, and the rest are given by: 


P 


t 
ky = (p) =*. 
n 


The proofs of most of these statements can be easily disposed of by simple 
argument or by reference. 

Thus, (A) implies (B), and (D) implies (E), because the average of a poly- 
nomial over all random samples is a symmetric polynomial in the values of the 
finite population. Every symmetric polynomial can be written linearly in terms 
of symmetric means, and every symmetric mean can be written linearly in terms 
of polykays (the actual formulas for degree $4, the highest with which we 
shall be concerned here, are given in [2]), so that (A) holds. A similar argument 
disposes of (D). The argument establishing (C) and (F) is given in Section 11. 
(G) appears in [1], page 516. And, finally, if only one value is nonzero, all sym- 





JOHN W. TUKEY 


metric means with two or more indices vanish, and since the expression of a 
polykay in terms of symmetric means involves only symmetric means with at 
least as many indices (see [2]; the actual formulas for degree £4 also appear 
in [1]), the same is true of polykays. The values of the one-index brackets and 
the polykays then follow by direct calculation. 


3. The variance of a sample variance. If x; , x2, --- , 2, are a sample of n, 
their variance 


a _ ss 
s = oe p Dua Z) 


is one of the most familiar statistics. Its variance in sampling, first from in- 
finite and later from finite populations, has been derived by many writers. A 
derivation using polykays is presented in [1], page 517. In principle, analogous 
processes can be used for the variances of more complex expressions, but the 
algebra can be avoided by taking another path. We illustrate this path now 
for the variance of the variance. 

We deal, then, with a sample of size n and polykays hk; , ku , ke from a popu- 
ation of size N and polykays ki, ku : ks . We have s° = ks and 


] vars = var kp = ave k; — (ave k:)’, 


where “ave” and ‘“‘var” refer to average values and variances for all samples of 
n from the population. Now ave k: is a homogeneous polynomial of degree 4. 
: ’ 2: ° 
Moreover, adding a constant to all 2z’s leaves k, , and also kz , invariant. Ac- 
cording to (C), therefore, we can express ave k: linearly in terms of population 
polykays which do not involve any index “1’’, and the coefficients will not in- 
, 2: 
volve N. Hence, ave kz is of the form 


oi(n)ky + viln) ko , 


while (ave {kz})” = (ks)* is of the form 


(daln) + 2) + (vm) + 2 


(where actually ¢.(n) = 0, ¥o(n) = 1). Hence, 
var (ke) = ( o(n) _ x) ky + ( vn) ~ 


where ¢(n) = ¢i(n) — do(n), ¥(n) = Wi(n) — y(n). 
Now consider the case n = N, where the sample consists of the whole pop- 
ulation, and k, is constant. We have 


Zh of a F. 
0 = (o(n) es *) ka + (vin) = ) kt, 
' n n— 1] 
and since ky and k_ do not satisfy any linear identity, we must have 


‘ 


jo -fat, de) -———=6, 
n n— 1 





VARIANCE COMPONENTS I 


so that the variance of ke is 


as before. 

This method of evaluating both “finite population corrections” and some of 
the other coefficients will extend easily to the standard analysis of variance 
situations, as we shall see. To it we shall need to add the use of minimal unit 
populations, by which we mean finite populations whose values are all zero 
except for one value of unity and whose size is as small as possible for the situa- 
tion considered. This population has vanishing polykays except for 

, ‘ 
kp = 1/n, > l, 2, 
This is the opposite of the infinite normal population for which 
’ 2 / / 4 
ke =o", k, = 0, ke =o, 
4. The balanced single classification. We now tackle the simplest model 


which we know how to specify for the balanced single classification, namely, 
the model with two finite populations: 


tig = at ni t w5;, i=1,2,---,¢, 
{n<} sampled from n, ki, ku, --- 
{w;;} sampled from N, Ki, Ku, --: ; 

sampling independent, order randomized. 


(We shall omit the primes from both sets of the polykays for simplicity and 
convenience.) 

Let A be an estimate of the “between” variance kz , which is a homogeneous 
quadratic function of the z;; and is unbiased in mean. Let B be an estimate of 
the “within” variance K, with similar properties. Then, we may conclude that 

var (A), cov (A, B), var (B), 


all have the form 
ak, + Bho + ykoKe + 6K + Ke . 


Our task is to determine the three sets of a, 8, y, 5, and e. 
Just as in the second method of treating the variance of the variance, the 
sizes of populations can only enter through the correction terms arising from 


2 
nit=— 


1 
= kee + — ky + 
n 


kee ; 
1 
keKe, 


Ku+ixK 2sek 
at 7 ‘tyr 4 22 





726 JOHN W. TUKEY 


In particular, the terms in n and N are the same for any quadratic unbiased 
estimates. 

But let us take the usual estimates obtained from an analysis of variance of 
A and B. Then, we know that the 7, vanish identically from B, and hence that 
ka, kee , and ke cannot appear in var (B) or cov (A, B). Thus, we have 


9 
var (A) = ( ca = ) kg + (6, — = | kee + Vike Ke + ii Ks + 4 Ke, 
; n n— 


cov (A, B) = b0Kg + aKa, 


; l ; 2 ; 
var (B) 5 (<. — V ) K, a7 (« —_ y+) Ko : 


where a; , 81, °°, € are independent of n and N. 
Now take w;; = 0, that is, take K, = Ke = K, = 0 and n = ¢, so that every 
n; is always used. Then A is constant, and we see that a, = 1/c, 8; = 2/(e — 1). 
Start again, take the n’s = 0 and take a minimal unit population (of size rc) 
for the w’s. Then, one and only one x will be unity, the others will be zero. A 
and B will be constant, but K, + 0 and ky , keo , ke , Koo all vanish. Hence, 


1 
& ——=0. 
rc 


We have now reduced our variances to the form 


var (A) = ( 
° 


CoV (A ’ B) 


var (B) € _ >) Kat (« af 
re 


We have four more coefficients to determine. We could find them in two steps 
by (i) introducing minimal unit populations for both 7’s and w’s, considering 
the two cases which arise, and finding y; ; and then (ii) letting the 7’s vanish 
and introducing a minimal population with /wo nonzero elements for the w’s 
so as to determine the remaining coefficients. It seems simplest, however, to 
fall back on normal theory. 

It is well known that when 7’s and w’s are drawn from (infinite) normal 
populations, the means squares are distributed like multiples of chi-square. 
Hence, we have 

Variance Covariance 
Between C2 Ce 2(Ks + rkz)? / (e — 1) 
Within + 2K3 / c(r — 1) 





VARIANCE COMPONENTS I 


The within component is the within mean square, and for infinite populations, 
~2 » . 
Kz = Ke , so that we have found that 


») 


eg =——.. 
c(r — 1) 


Observing that the between variance component is 
l , i i 
- (MS between — MS within), 
. 


we find the covariance between the two components to be 


| kel 2 r2 
— —var {MS within} = — —— Ks, 
/ er(r — 1) 





so that 
») 


~ er(r — 1)’ 


and the variance of the between component to be 


2(Ke + rk)’ 2K> 2 2 4 
1 2(Ke + rhe) Ke ko oie 
x eat c(r — 1) Ee ] r(c — 1) 
2 l l 
+ ( 7 — ) Ki, 
r\e— 1 c(r — 1) 
so that, again, since K; = Kx for the normal distribution, 
. 2 4 af os ) = _ (re — 1) 
eH l” wen rP\e—1" er—1) er2(¢c — 2)(r — 1)’ 
Our final results, then, are 
] l 2 2 
var (between) = (- = *) ky+ ( ie : -) Kee 
Cc n c= l » = ] 
2(re — 1) ; 
r(e — 1) re(r — 1)(c — 1) 
2 
cov = - -—— Kes, 
re(r — 1) 


Il 


var (within) 


l l )K ' 2 2 )K 
(—. ~ N Bi (=. — 1) “N= — 


These are reasonably simple formulas and are entirely free of assumptions of 
normality of distribution and infinity of population. They retain, however, an 





28 JOHN W. TUKEY 


assumption of an independence character, namely, that the 7’s and w’s are 
independently drawn and allotted. 

Clearly, variances and covariances of components for at least certain balanced 
designs can most easily be found by combining (1) the finite population terms, 
(2) the effects of minimal unit populations taken one at a time, (3) normal 
theory. We shall do this in a few cases. 


5. Simplest double classification. We now go on to the simplest row-by- 
column model, one without explicitly identified interaction, where 


Zig ~ wt n+ Fs + ws;, ¢=1,2,---,6¢, j=1,2,-- 
m from n,k,ku,--: 


> > * = 
gj from uw”, K ls ki 


, 


Wij from N, K, . Ky (eee 


independently and randomly sampled and arranged. 


We have components for columns (associated with the n’s), rows (associated 
with the ’s), and residual (associated with the w’s; also called “interaction,” 
“discrepance,” or “error’”’). Making use of the same principle for allotting 
finite population terms and zero coefficients, we may write down the general 
structure of their variances and covariances in the following forms: 


9 
{ cols} - (cs = *) ky + (4 ane =) koe + yikeK. + Ka + aKn, 
d n— 


q 9 
| rows} = (as — =) ki + (6: —_ — -) kgs + yoke Ke + 62K4 + eK 


t 


rows, cols} = 6Ki + og Kn 
Recut . l\ » 2 
, res; (6 = x) Ky, + (« — 
cov {cols, res} 65K, + 6 Kn 
cov { rows, res} = 8g Ke + oc Kez. 


If any one population is a minimum unit population (that is, ifn = c, n’ = 
r, or N = rc) and the others are constant, then all parts of the analysis of var- 
iance, and hence all of the estimates of variance components, are easily seen to 
be constant under randomization. Hence, we see that 





VARIANCE COMPONENTS I 729 


Now, taking normal theory, we have for the mean squares 


Mean Variance Covariar 
MS {cols Ke + rk 2(K, + rke)? / (ec — 1) | 0 
MS {rows Kz + cke 2(Ke + ekz)? / @ — 1) 0 0) 
MS {res K 2K3 / (r — 1)(e — 1) 


From this, we easily derive the corresponding table for estimates of variance 
components: 


Com- 





: Mean Variance Covariances 
ponent 
: 1 {2(Ky + rk)? 2K; 
co — 1 — 
Tr C 1) (r 1)¢¢ 1) 
1 2K3 
re(r — 1)(c 1) 
a 2 (7 + ck)? 2K} 
rows k - 4 —~ a 
, c? ’ 1) (r 1)(c 1)/ | 
1 2K; 
rv 1) t ] 
1 2K: 
c (r - 1)fc 1) 


res Ky 2K3/(r — 1)(e — 1) 


whence we see that 


2 2 
B = avon 1 De = : 7 9 
: | ngs 4 
6 ee — 1)’ ~ e(r — 1)’ 


2 ( ] ] 2 
oe" Ple—-1'’ &F—-De—-D/ rr—De-— Dd’ 


\l 


2 ( 1 ' 1 ) 2 
e” @W—1' &—ie-D/ a&— ive —1)’ 


a =—-;- a 


“"G-Be-v 
—2 
& = Ly Sa 


ce(r — 1)(e — 1)’ 


ie 





730 JOHN W. TUKEY 


thus completing the calculation. The final answers are, then, 


] | 2 
; ‘ols! = ( _ ) / =. a 
, Cc n fig (. — | 


cols, rows} 


} rows, res} 
cle — 1l)(r — 

6. The Latin square. The next example in order of complexity is the Latin 
square, for whose side we use k to avoid confusion with n. Since rows, columns, 
and treatments enter symmetrically in a fully randomized model of the sort we 
are discussing, we need not treat them separately. 

We can write down the formulas almost at once by analogy with those just 
given. The main effect considered (columns, rows, or treatments) is sampled 
from n, ky, ku, --- , while the cell contribution is sampled from NV, K,, Ky, 


--+ . Then, 


( ; 
var ;main; 


baie. 2 
) neg | = ‘ai 4 a 
ae Fe i) Ka ( k— Dé 


) 


k2(k — 1)(k — 2) 


;main, main*} = Ko 


) 


k(k — 1)(k — 2) 


f art “oc | ai. 
»main, res; = — 


Ko. 


7. Balanced incomplete blocks. A moment’s computation shows that a 
single minimal unit population, with the others constant, still leads to constant 





te 


VARIANCE COMPONENTS I 


analyses of variance. Hence, the terms in K, still vanish, while those in /4 and 
a 
ky have their usual simple form. 


If we have 
b blocks, v varieties, r replications, 
and henee 
vr/b plots per block, 

and if the 

varieties are from n, ky , ky, 

blocks are from n*, kT , ki, , +--+ , and 

fluctuations are from N, K,, Ku , 


and if we recall that the analysis of variance runs 





DI AvMS 
Varietic l ] Ks. + rh 
° vr , * 
Blocks b ] Ke. + % > 
Residue t l 65+ 2 K 
AvMS average mean square), then we see that 
| | 2 2 } . 
Val ) Vars ( aes Ig 4 ( + ) k , ho K 
’ n in | n— | riv — 1) 
2(v7 — b | . 
is — x Ke 
r(v — lier — v — b+ 2) 
| | 2 ay | tb 
var {blocks ( ——)it+(- i ) kh 4 eK 
h n h— |] n™* — ] vr(b — 1) 
2Qvr — v + 1)b : 
ee ste Kes, 
vr(b — 1)\(vr — b — vy + 2) 
{1 | ) ( 2 2 : 
var )res ~ Ky 4 ——— — — — K2, 
ar An we (Wp * TAS = pc bt Ee P| 
2b : 
cov }vars, blocks} = ————_—- Kn, 
vw(er —v — b+ 2 
2 ; 
Ccoy\ ) Vars, res}  . ne kK ’ 


COV a locks, res} = - ~ Ke. 





732 JOHN W. TUKEY 


8. The general notion of balance. We are familiar with the idea of an analysis 
of variance separated into “lines” such that there are one or more kinds of 
contribution peculiar to each line. The notion of a line A falling “below” a line 
B is clearly understood by most expert practitioners, although it is not often 
discussed in print. For our present purposes, it will suffice to say that A is 
below B if the variance component corresponding to A appears in the average 
value of the mean square for B with a nonzero coefficient, but the converse is 
not true. (Besides ‘‘A below B” and “B below A’’, we could have ‘A beside 
B,” when neither variance component appears in the other AvMS, or we could 
have “A intertwined with B,” when both variance components appear in the 
other AvMS.) We shall be dealing both with individual lines and with groups 
of lines, which by convenient analogy we call paragraphs. 

In any specific analysis of variance which does not involve intertwined lines, 
if we fix our attention on a specific line, we can divide all the lines into three 
paragraphs: 

(a) The upper paragraph, containing lines above and beside the chosen line, 

(b) The chosen line, 

(c) The lower paragraph, containing lines below the chosen line. 

Some of these paragraphs may be empty. This division is based on average 
values of mean squares. (The implied inequalities need not carry over com- 
pletely to individual values of mean squares.) But stronger conditions may 
hold, as in the examples discussed above. In particular, 

(1) An arbitrary change in the contributions associated with the given line 


may have no effect on the mean squares in the upper paragraph in each and 
every particular analysis of variance. 

(2) If all the contributions associated with the given line vanish except for 
one, and the contributions from the upper paragraph all vanish, then the mean 
squares in the upper paragraph may not depend on the location of the one 
nonzero contribution from the given line. 


Furthermore, if both of these contingencies occur, we shall say that the 
analysis of variance is balanced with respect to the given line. An analysis of 
variance balanced with respect to all of its lines is balanced. (Since the analyses 
of variance usually called “balanced’’ possess these properties, this definition 
is an extension of previous usage.) 


9. Balanced cases in general. We can easily write down the variances and 
covariances of any variance component in such an analysis. It will involve 
three types of terms: 

(1) Terms in the k, and ke» of the corresponding contributions, 

(2) Cross-terms of the form k.K-z , where K2 refers to some line from the lower 
paragraph, 

(3) Terms in the Ko’s for these lower lines. 

(There will be no terms in the K,’s for lower lines because of condition (2) of 
the last Section applied to these lines.) Suppose there were no errors, no lower 


opth as a OM, 








VARIANCE COMPONENTS I 733 


contributions, then we should have measured the upper contributions of in 
terest exactly, and must face a variance of the form 


(1 1) ( 2 2 ) 
-— —) ky + | —— — —— J kn 
\b wv” le 7 cane 


where we have investigated b out of n cases. But these must be just the terms 
of the first type, since the others vanish with the errors and lower contributions. 
They will always be easy to write down. 

The other terms are those which we found from normal theory. Let us illus- 
trate in an imaginary example. Let us suppose that some design leads to an 
analysis of variance of the following shape: 


Item DF AvMS 
* | 2 52 
i 3 o? + 30; + Toe 
. 
b 7 ao? + 4o; 
( 8 o? 


. . 2 
Clearly, the mean-square component estimating o2 must be 
4MS, — 3MS, — MS, 
sae teers wes at 
and its normal theory variance is to be found from the chi-squared variance of 


(average)* 


O. . saeunehtiaienaiieiiieaibaeeiasiin 
egrees of freedom 


= 


9 


applied to each mean square, which yields 
4° (a° + 301 + 702) 


9 3° (0° + 403)” , 1° (’)’ 
.=. ———; 563 a + 
28? 3 28? 7 28? 8 


2 


di 55 {6.7444 o' + 42.29 o’o? + 7.4670'02 + 68.5701 + 2240702 + 261.303}. 
These, then, are the terms of types (2) and (3), which can be easily written 
down in almost any balanced case. 

As an illustration of these principles, we shall write down the variance of a 
main-effect variance component in a three-way (balanced) analysis with repli- 
cation, the shape of which is 


DF AvMS 

’ . 2 2 2 2 
Treatments t-1 o? + Poere + CPotry + Tpoce + Crpo;: 
Rows r-—1 o? + Dorr + cpo', + tpo?, + ctpo? 
ztxa (¢-1)(r-1 o? + pore + cpor 

TXxC (t — 1)(e — 1) o? + poss + rpoes 

RxcC (r — 1)(e — 1) o? + pore + tpor 

TxXE xs (r ~ 1)(e — 1)(t — 1) o? + pore 


teplication ret(p — 1) o? 


a sey 





734 JOHN W. TUKEY 


Clearly, o; is estimated from 


MST — MS(T x R) — MS(T x C) + MS(T Xx R X C) 


crp 





and hence the variance of estimate is 
(+ ad 1 ) ka + ( — bes a 40% Cet ; + 4o; Ort 4. 4o; Orct 
3 n t c(t — 1) r(t — 1) rc(t — 1) 


2 2 ‘ 9,4 9-7 -* 
4o; g ih 2 2ert 2oct Ort 


ieloona 7 : ‘. elaine aeaeeetieor 
rep(t — 1) (c r(r — 1)(t — 1) r re(t — | 


4oet Cert oe hort Cert 
ei 7 oss ae oe 


rir — 1)(t — le 
4. : a. 
e(e — l)r(r — L(t — 1) 


2 2 —— 
Se 20 


c(e — 1)r(r — 1)(t — l)p cle — lr(r — 1)(t _ l)p?’ 
where o¢;, o+2, Gert, and o* are understood to stand for the appropriate poly- 
kays (with subscripts “‘22’’) in the event any of these populations are finite. 


10. A further example. Let us show that the generality of the concept of 
balance is rather wider than one might suppose at first glance. Consider the 
case of a single classification with equal numbers of cases in each column, with 
each column subject to different fluctuations. The formal model runs as follows: 


Xi = wt nj + wi, t= 3,2, °-- ,¢, 
{n;} sampled from n, ky . kay , ore 
{wij}, for each 7, sampled from N;, Ki.i:, Kui, --> 


independently and randomly sampled and arranged. 


It is easy to verify that the conventional analysis, which does not take ac- 
count of the differences in fluctuation between columns, is balanced and that 
the lines, degrees of freedom, and average mean squares are (writing o, for k» 
and o for K, ;) as follows: 


Line DF 


Between 


Within Col 1 
Within Col 2 


Within Col ¢ 





i soe 


VARIANCE COMPONENTS I 735 


The estimate of o; will be found from 


ei 
(MS between) — - >_|(MS within; 
- 


r 


and the corresponding variance is given by 


f 
t 
a 
> 

=. 
yy 
ll 
xy 
to 
x 
3 
| 
a 
— 
Il 
= 
mn 
+ 


aha eee ee ) kee 4 slg hoverg: (Nad 
Cc n/ \c l n ] ric | : 


and when we use 


this becomes 


}(2 - 2) + (25 - 24) tet ZAG ek 


2(re — 1) : tes 2 
, __2re—- 1) oe | 2 € posi ii 
(ec — l)e(r — 1I)r’ ‘| 7 E \ Ket N-1 Kx), 





where the first bracket is the same as for the simple single classification model 
and the second bracket expresses the result of separating the fluctuations into 
the separate columns. 


11. Proof of disappearance. It was asserted in (C) of Section 2 that if a poly- 
nomial is invariant under translation, its expansion will not involve any polykays 
with unit parts (indices “1”’). We now proceed to give a proof. 

Altering a finite set of numbers by translation through 6 is equivalent to 
randomly pairing them with a set of numbers all of which equal 6. The poly- 
kays for this new set are 


, «2 , aS 
ky = 6, ky = 6, Kiy = 8, e+- 


, 








736 JOHN W. TUKEY 

and all others are zero. In accordance with the pairing rules (see [2], [1]), the 

effect of translation is then to alter the polykays of the original set as follows: 
kk + 4 
ky > ky + 28k, + & 
kp > ke, 


’ 


kin — kin + 38k + 387k, + 8°, 

ky2 — ky + 5k. , 

kz — ks . 
Now if the invariant polynomial is 

Crk, + enky + cok, + einkin + Crki2 + eaks + 
before translation, it is increased by 
C5 + en (26k, + 5°) + ¢em(3dku + 35k, + 5°) + Cybk2, + 
= 6(¢, + 2enki + 3emkn + crk2 +--+) 4 n+ 3¢inky 
and since this must vanish for all 5, we have 
C1 + 2enki + 8einkn + eke + 


Cu + 3emki + --- = 0, 


and so on. Now, the original finite set was at our disposal, and if we multiply 
each element in it by ¢«, we shall multiply each of its polykays by « raised to 
the degree of the polykay. The last set of equations become 


)- 2 . * . 3 ° ‘ . | » ~_\ 
CG + €(2kien) + €(B8einkiu + erwke) + €(4enukin + 2erekw + cisks) + 


ave] ' 


Cu + e(¢inki) + € (6eunku + Curk:) + 


whence, comparing coefficients along a triangular path, we deduce the vanish- 
ing of c; , C1 , Cu , Ci2 , Cu , Cuz , C13 , ANd so on. Hence we have the result stated. 


REFERENCES 
{1] Joun W. Tuxey, “Some sampling simplified,’ J. Amer. Stat. Assoc., Vol. 45 (1956), 
pp. 501-519. 


[2] Jonn W. Tukey, ‘Keeping moment-like sampling computations simple,’”’ Ann. Math. 
Stat., Vol. 27 (1956), pp. 37-54. 





cacaae <P 


MATRIX METHODS IN COMPONENTS OF VARIANCE AND 
COVARIANCE ANALYSIS 


By S. R. SEARLE 


New Zealand Dairy Board 


Summary. The sampling variance of the least squares estimates of the com- 
ponents of variance in an unbalanced (non-orthogonal) one-way classification 
and the large sample variances of the maximum likelihood estimates of these 
quantities are summarized in a paper by Crump [1]. The present paper outlines 
a method of obtaining these results by the use of matrix algebra, and extends 
them to the sampling variances of estimates of components of covariance when 
two variables are considered. The methods are also used to obtain the large 
sample variance-covariance matrix of the maximum likelihood estimates of the 
components of variance and covariance. 


Part I. CoMPONENTS OF VARIANCE 


1. Model and analysis of variance. We are concerned with data in a l-way 
classification with unequal numbers of observations in the classes. The linear 
model is taken as 


w= prat ey, 


where 2;; is the jth observation in the ith class. We will assume that there are 
c such classes (¢ = 1,---, c), the ith class containing n; observations 
(j = 1, --+ , ns); and let >» n; = N. wisa general mean, and {a;} and {¢,;} are 
random samples of size c and N from two normally distributed populations 
having zero means and variances 04 and o: , respectively. This is Eisenhart’s 
Model II [3] and it is to this model that the discussion confines itself. The prob- 
lem is to find the sampling variances of the estimates of «2 and o2 based on the 
usual analysis of variance of between and within classes. These estimates are 


(1) 6 = 1/(N — co) >> 25 — Do ne], 


(2) 62 = 1/f[1/(e — 1)( dona. — N2#*) — 2], 
where f i/(e — 1)((N — > n/N), and Z; = 1/ni >>; 25, and 
2. 1/N ois 243. 

2. Normal theory. In general if x; --- zy is a set of multivariate normally 


distributed random variables with variance-covariance matrix V, and vector 
of means zero, their distribution function is given by 


. 1 Pgs 
dH (x, +++ ty) = (Qn)¥2|V jb xP {—4x'J x! dx, --+ day, 


where x’ is the row vector (x, --- ty). 








Received April 18, 1955; revised January 17, 1956. 
737 








738 S. R. SEARLE 


If y is a function of the z’s, y = x’Fx, defined by the symmetric matrix F, 
then the characteristic function of y (with parameter ?) is 


[/ eee / é tyt dH (2, ees In), 


and by the use of Aitken’s Integral [7], this can be shown to be equal 
to| Il — 2tVF ‘ee where J is a unit matrix. Then, if K;” is the rth cumulant of y, 


> K” ) = —} log |I — 2uVF'|. 
r Pr? 


By making use of the properties of the eigenvalues of a symmetric matrix, it 
has been shown ([9], p. 40; [10], p. 131; and [6], p. 247) that 


ole) orl . — 
K,” = 2 '(r — 1)! trace (VF)’. 

For r = 2, this gives the result 

(3) variance (7) 2 trace (VF)’. 


It is this principle that is used to find the sampling variance of é, and é, by ex- 
pressing them in the form x’Fx. 


3. Sampling variances of the least squares estimates. 

Noration. It will simplify procedure if we write a for 0, and e for o, and 
similarly @ and é for the estimates. 

Let the row vector of observations, (1 --- Zin, *** Za *-* Len,), be written 
as x’. Then arraying the data in the order of x’ it is seen that V is a square 
matrix of order N, the only non-zero elements being c square sub-matrices of 
order n; (¢ = 1,---, ¢), lying along the diagonal, each with diagonal terms 
a + e and non-diagonal terms a. Matrices of this particular form we will call 
A matrices, A; being defined in general as a square matrix of order n;, with 
a; in all its diagonal terms and b; everywhere else. The matrix of order N whose 
only non-zero sub-matrices are A,’s in the diagonal will be termed a C matrix. 
Thus V is a C matrix, with a; = a + e, and b; = a. 

The quadratic form for (NV — c)é can now be expressed as 


(N — c)é = x’Fix, 
where F; is a C matrix with a; = 1 — 1/n, and b; —1/n; . Thus from (3) 
(4) (N — c)’ var (@) = 2 trace (VF;)’, 


where V and F; are C matrices. Now the product of two C matrices is itself a 
C matrix, and 


(5) trace C” 7. nila; + (n; - 1)b;]. 


Combining these results leads to the well-known expression 


(6) var (é@) = 





COMPONENTS OF VARIANCE 739 


The variance of 4 is arrived at in a similar fashion, in the course of which two 
further matrix types arise. The first we will call a K matrix, K;; being a matrix 
of order n; X n; with k,; in all its terms. The second is termed a J matrix, a 


square matrix of order N, being a C matrix with the zero sub-matrices replaced 
by K,,’s. 


In terms of these matrices one can show that the quadratic form of (2) can 
be expressed as 


(7) fé = x’F.x, 
where F, is a J matrix with 
(N — 1)(1/n; — c/N) 
(e—1)(N—ec) ’ 

bs = a, + 1/(N — 0), 
and 

ki; = —1/N(e — 1). 
Thus VF; is the product of a C and a J matrix, which can be shown to be a J 
matrix with k,;; independent of 7. For such a matrix 
(8) trace (J*) = >> nai + (ni — 1)bi] + (So nk)? — & niki. 


vs y 2. ° es at ° 
Using these results 2 trace (VF:)° is obtainable, thus giving var (4). which can 
be written as 


(9) var (4) - 


1f 2e(N — 1) 2ea(N* — S») +2a°(N*S2 +S: — 2NS;) 
PLe—-DW-—c' Nte—1) Nc — 1)? ’ 
where S. = }° ni, and S; = >> nj. This is the result given in Crump [2]. 

It is also of interest to find the sampling variance of the estimate of the total 


variance, (4 + é@). By these methods it can be shown that the (VF) matrix for 
the expression (ec — 1)(@ + fa)—i.e., for (>> na&i — N#*?)—is 


c - 2 

Ka ee Kee ; 

with k,; (e + na)(1/n; — 1/N), and k,; —1/N(e + n,a). This leads to 
the result 


covariance (4, é) (—1/f) var (é), 
which gives 
(10) variance (4 + é@) = (1 — 2/f) var (é@) + var (4). 


4. Large sample variance of maximum likelihood estimates. The likelihood 
of the sample, L, is given by 


, 1 wfs i , 
e” = (=) |\V | exp — }3x'V"x. 


aT / 








740 S. R. SEARLE 


Thus 
L = constant — 3 log | V | — 3x’V"x. 


Now V isa C matrix with a; = a + e, and b; = a; and it is easily shown that 
the inverse of a C matrix is a C matrix with terms A;’, Aj’ itself being an A 
matrix. Also 
|¢| = JT |As| = I] G@ — 6)" las + (ni — 10). 
i 

These results can be applied to the expression for L, which is then readily 
differentiable with respect to a and e. Then the inverse of the matrix whose 
terms are minus the expected values of the second order partial derivatives of 
L with respect to a and e gives the large sampling variances and covariance of 
the maximum likelihood estimates of the variance components. These results, 
due to Crump and quoted here for completeness, are (setting a/e = Q) 


var (€) = 2e*>, wi/D, 
(11) var (@) = 2e[N —c+ > wi/ni|/D, 
cov (Gé) = (—2e°>> wi/n:)/D, 


where w; = ne/(e + na) = n;/(1 + Qn,), and D = N>; wi (> w,)*. Thus 
we have established the well-known results for the least squares estimates of 
the components of variance, the sampling variances and covariance of these 
estimates, and the large sampling variances of the maximum likelihood esti- 
mates. We now proceed to find the same results for the components of covariance. 


Part II. Components OF COVARIANCE 


5. Least squares estimation. We consider the problem of the components of 
covariance between two variables xz and y, each based on the same linear model 
in a l-way classification, under the assumptions of Eisenhart’s Model II. a’, e’ 
and a”, e” are taken as the variance components of y, and the covariance com- 
ponents between x and y, respectively, following directly from the notation of 
paragraph 3. 

The least squares estimates of a” and e” obtained from the Analysis of Co- 
variance are the same functions of the sums of products of x and y as @ and é 
were of the sums of squares in the Analysis of Variance: 


= 1/(N —o)[D Xi nw — Lng), 
fa” = 1/(e — 1)[Do nti.g:. — NZ.9.] — @. 


To find the variance of these estimates we use the same methods as in finding 
the variance of @ and é, namely expressing 4” and é” in the form x’Fx, and, 
using a variance-covariance matrix V, evaluate 2 trace (VF)’. In this case we are 
concerned with a random sample of 2N variables (x1 Len, 5 Yur 
which we assume to be multivariate normally distributed with variance-co- 


(12) 





COMPONENTS OF VARIANCE 741 


variance matrix V,, say. A little consideration will show that V;, associated 
with the vector (x’y’), is 

, as 

vy,= ¢4 a) 


where V’ and V” are the same C matrices as V, but in terms of a’, e’ and a”, e” 
respectively. This notation will not be confused with the usual use of primes 
to denote transpose matrices, since no transposed matrix enters into this analysis. 

We now proceed to find the matrix expressions for the sums of products. 
Writing z’ = (x’, y’) the following results hold: 


> Li yi = bz G 4 z, 


ps Niki. Gi. = 42 (i ) Ss, with a; = b; = 1/n, 


NZ..9.. = 42 & a *) z, Ky being an NXN K matrix with terms 1/N. 
ae 


These expressions give 


y wus ’ . Fy 
_— p = 1 
(N c)é 4Z ( ; )z, 


and thus the VF matrix for (NV — c)é” is 


‘ ( V HH . _ _ (vr var) 
3 Vy” y’ F, , = ¢ V'F; V’ PF; . 
Now each of the four sub-matrices in this expression is the same VF; as used for 


obtaining var (é) in (4). Therefore in terms of the general result (3), trace (VF)* 
for (N — c)é” comes from a double application of (5), namely 


(13) > nday? + (ny — 1)b7? + aay + (ny — 1)dd%), 
which leads to the result 
var (é”) = Ce i 
N-c 


A similar procedure holds for var (4”). From (12) fé” can be written as 


ja” = a2'| x Ant 4 oe Kn) _ A; \\: 
m” IN =-se- DC - e-itke - N—-c\l - 
See h eo 
= iz # Ye, 





742 S. R. SEARLE 


where F, is the J matrix defined in (7). Therefore 


on (98°) «= 3 tease) at ey Pay) 3 ace (VF ~~) 
var (fa) 2 trace | 3 (fs v')\p . = 3 trace VF. VP.) ° 


The sub-matrices of this expression are the same J matrix as considered in ob- 
taining var (4). Therefore by a double application of (8) similar to (13), var (f4@”) 
is obtained. This leads to the result that var (@”) equals 

2| 9 — 1)(e”? + ee’) 4 (N* — S,)(2e"a” + e'a + ea’) 

f?L (N — eve — 1) N(c — 1)? 


(N*S_ + Si — 2NS,)(a"* + =) 





(14) 
+ —— a — 9 
N*(c — 1)? 
which is the same expression as var (4) with (e”* + ee’), (2e"a” + ea’ + e’a), 
and (a”” + aa’) replacing 2e’, 2ea, and 2a’ respectively. 
Finally it can be shown that equation (10) holds for é” and 4”, namely 


(15) var (@” + 4”) = (1 — 2/f) var (@”) + var (4”). 


Thus far we have found the variances of the least squares estimates of the 
components of covariance. The next step is to have the efficiency of these esti- 
mates by finding the large sample variance of the maximum likelihood esti- 
mates of e” and a”. 


6. Maximum likelihood estimates—large sample variances. 

6.1. L, the likelihood function for the sample of 2N observations is given by 
L ] Te > i+} 1,’t7—! 
e = (| — |Vi|* exp (—4z Vy 2z), 


24 


where 


with a; = a + e and b; = a (and similarly the primed terms) by the definition 
of paragraph 3. 

We will now consider an orthogonal transformation of z, w = Tz, the variance- 
covariance matrix appropriate to w being W. With 77’ I,w Toa; 
V, = T’'WT, and 


2r 


N/2 
(16) e = (+) |W \> exp (—}w’ Ww). 





COMPONENTS OF VARIANCE 


“I 
Pe 
Go 


The value of 7 which simplifies V most easily is 


Hy, 


L AL 
where H, is a matrix of order n; having terms 
h 1/V/n for p 1, and all g, 
0 forp > l,andgq > p, 
l/~ ‘p(p — 1) forp > l,andg < p, 
p — 1)/V p(p — 1) for p > 1, and q = p, p, 7, = 1---n;. 


As an example, for n 1, 


l l l 1 

Vv ‘4 V4 /4 /4 
1 —] 

v2 v2 

H 

1 1 -1 | 

V6 V6 V6 
] ] l —1 | 
7a ZF 7. 75 


For this value of 7, 


W = TViT", 
| D; DI | 





D. D’ | 
(17 W=| _. ; 
| Di D; 
. . soe 
D. Di 


where D, is a diagonal matrix of order n; , the leading term being (e + n,a), the 
remaining terms e. 





744 S. R. SEARLE 


6.2. To obtain W and w’W ‘w of (16) first observe that an elementary matrix 
of the form 


” 
( my . my, 
” 
| ° Me ° ° Me ° 
= M3 


Immediately this gives 
|M| = II (mm; — m7”). 


ie w= . . ° , 

Similarly M~™ is itself an M matrix, with m;, mj, and m; replaced by 
’ ' "2 ” , "2 , w2 ° 

m;/(mm; — m;°), —m;/(mgm; — m;°), and m;/(mygn; — m,°) respectively. 
6.3. These results can be extended, and applied to W as given in (17). 
Notation. Write 


pPi=et+ na, 


m2 
>’ 


q = ee’ —e 
, "2 : ie ’ 
ro=pp—-pi = (e+ najye’ + na’) - 


Then 


W | a " II ri, 


| * 


where D, is a diagonal matrix of order n;, with leading term p,/r; , and other 
j /; ” . . 

terms equal to e/g. D; and D; have the same form as D but with their numera 

tors primed. 





COMPONENTS OF VARIANCE 745 


T 1 viy—im 1 . 
Now w'W sw = z'7’W Tz. Furthermore V; = T’WT, and therefore, since 
rl: > 7 y—I rr , * . 

W ~ is of the same form as W, T’W T has the same form as V}: its sub-matrices 

are A matrices. This being so, notice that for a vector of n; x’s, x; , 


x; Aix; _ (a; — bi) > ij + bs xi. 


(19) Ot eae 2? 
= (a; — b,) (x tii — zt) + a fa; + (ni; — 1)di). 
? 1 + 
Therefore 
wWow = 2'T’W Tz 
(20) 


Doi (xiAix: + yiAiys — 2xiATy:) 
can be expressed as a sum of terms like (19). Now the A matrices in V; have 
a, = a+ eand bd; = a. V; = T’WT, and the D matrices of W have leading 
terms e + n,a and other terms e. Therefore, since the D matrices of W* have 
leading terms p,/r,; and other terms e/g, the A matrices of T’W'T have 
a; = [(n e 1)e/q + pi/ri\/ni 


and 


b; (p:/r; — e/q)/ni . 
This gives 


(21) a; — bi = e/g, and a; + (nj — 1)bi = p,/r, 
Substituting expressions (18) to (21) in (16) gives the likelihood as 
L = — 4N log (2x) — 3(N — c) logg 
(22) — 1 log r; — 3(e’X + eY — 2e”Z)/q 


_ +> (piX; + pi¥i — 2p{Z,)/r:, 


where 


s 9 2 . T 
»3) A= ij Ti — Za Ni Ei. , with expected value (N — c)e, 
(Xs = ni, with expected value p;. 
Y, Y, and Z, Z; are similar sums of squares of y and sums of products of z and 
y respectively, with appropriate expected values. 

6.4. To find the large-sample variance of the maximum likelihood estimates of 
all the six components of variance and covariance together, we require the 
6 X 6 matrix whose terms are the expected values of the second order partial] 
derivatives of L with respect to e, e’, e” and a, a’, a”. Call this matrix LZ, , and 
consider the row vector of operators: 


v-(2 a9a0a $s) 
de de’ de" da da’ da”) - 





746 S. R. SEARLE 
Then L, — Kae'L. Applying this to (22), Le will involve the following terms: 


] ’ ' 
a0 log q = = (q0q — aq 9q), 
fe\ | bl ] . ja — Eo , 
00 (- } = -(a0e) — — (de 9g + ed0q) + — (edq OQ), 
q/ q y T 


and similar expressions for 00’ log (r;) and 00’(e/r;). Writing 


the terms in (24) can be written as 


aq = (e’ e —2e"0 0 0) S say; 


; U " ’ 
dg = ( = 5, Say; 


\ 


—2p;0 0 0) = ti, say; 


! 
0p; = (1 
with similar results for e’, e”, p; , and p; . All second order derivatives of the 
e’s and p’s are zero. 
Using these terms and the expected values indicated in (23) it can be shown 
after a little reduction that 


atin ead ; (s’ — 8) +4 D3 ut, — TO. 


7 7; 


This has now to be inverted to give the variance-covariance matrix of the maxi- 
mum likelihood estimates. If we define 


19 


2 at # 
e€ —2ee 


72 2 ” 
€ ¢ — 2ee 
/ 


” ” 2 
—2e'e” —2ee” ee’ + 2c” 


and similarly P; in terms of the p,’s, then — LZ, can be written as 


N-*pi tp, a 
¢ r? r? 
LP. Mp. 


ri 





COMPONENTS OF VARIANCE 747 


6.5. Inversion of the matrix in this form does not seem possible, and so in order 
to make use of it in applications one must at this stage resort to arithmetical 
methods, replacing the components by their estimates, computing the matrix 
as it stands, and then inverting it, either directly or by a method of partitioning 
[5]. For calculating L, , 24 terms must be computed; 6 of these are the squares 
and products of the e’s multiplied by (NV — c)/q’, and the remaining 18 are the 
sums of squares and products of the p;/r; terms, weighted by 1, n;, and nj; 
The computing is facilitated by grouping together at all stages all classes having 
the same number of observations in each class. 

6.6. Due to the symmetric nature of P, the upper right (and since L, is sym- 
metric also, the lower left) quadrant of Lz’ is symmetric. This means (for ex- 
ample) that the large sample covariance between the maximum likelihood esti- 
mates of a between-classes component of variance of z and the within-classes 
component of variance of y is the same as that between the between-classes 
component of variance of y and the within-classes component of variance of 7; 


1.€ 


“> 


cov (aé’) cov (@’é), 

and similarly 
cov (a”é) cov (aé”), 
cov (a”é’) cov (@’é”). 


6.7. Where two variables have a bivariate normal distribution with variances 
oj and o; and unknown correlation coefficient p, it can be shown that the large 
sample variance of the maximum likelihood estimate of o; (where p is estimated 
also) is 2e;/n, (i = 1, 2.) This is the same result as when the two variables are 
assumed independent. Generalizing this to the case which we have considered, 
it can be seen that the values in the inverse of the matrix (25) appropriate to the 
variance components e, a and e’, a’ will be the same as the expression (11). The 
matrix, however, gives further information about the covariance components 
e” and a”, and also the large sample covariances among the maximum likelihood 
estimates of all six parameters. 


7. Conclusion. Henderson [4] has shown how components of variance can be 
estimated from unbalanced data in an n-way classification, and states that sam- 
pling variances of such estimates are unknown—this is certainly true for n 
greater than one. This paper presents a matrix method suitable to finding the 
variance in this known case, the 1-way classification (under the assumptions of 
Eisenhart’s Model II) with a view to extending it to higher classifications. As 
a first step the method has been shown to give results for the covariance case 
in a l-way classification, and it would seem that the 2-way classification for 
components of variance can be handled in a similar fashion. 





REFERENCES 


1} S. L. Crump, “Present status of variance component analysis,’’ Biometrics, Vol. 7 
1951), p l 





748 S. R. SEARLE 


2} S. L. Crump, “Estimation of variance components in analysis of variance,’’ Biometrics 
Bull., Vol. 2 (1946), p. 7. 
3] C. Ersenuart, ‘‘The assumptions underlying analysis of variance,”’ Biometrics, Vol. 
3 (1947), p. 1 
C. R. Henpsgrson, “Estimation of variance and covariance components,’’ Biometrics, 
Vol. 9 (1953), p. 226. 
H. Horet.ina, “Some new methods in matrix calculation,’’ Ann. Math. Stat., Vol. 14 
(1943), p. 1. 
[6] H. O. Lancaster, ‘“‘Traces and cumulants of quadratic forms in normal variables,” 
J. Roy. Stat. Soc. (B), Vol. 16 (1954), p. 247. 
[7] TURNBULL AND AITKEN, Theory of Canonical Matrices, Blackie & Sons, 1932, p. 174. 
[8] F. W. Wauau, “A note concerning Hotelling’s method of inverting a partitioned 
matrix,’’ Ann. Math. Stat., Vol. 16 (1945), p. 216. 
[9] P. WHitrLe, Hypothesis Testing in Time Series Analysis, Alquist & Wicksells, Uppsala, 
1951. 
[10] P. Wurrr.e, ‘“‘The analysis of multiple stationary time series,’ J. Roy. Stat. Soc. 
B), Vol. 15 (1953), p. 125. 





ON THE HYPOTHESIS OF NO “INTERACTION” IN A MULTI-WAY 
CONTINGENCY TABLE 


By 8. N. Roy anp Marvin A. KAsTeENBAUM 


University of North Carolina 


1. Summary. In a situation in which the observations are frequencies in a 
multi-way contingency table such that the observations are supposed to be inde- 
pendent and it is only the total number that is supposed to be fixed from sample 
to sample, a hypothesis on the structure of the probabilities in the different 
cells or categories is put forward. This hypothesis, by a certain analogy with 
the customary terminology of analysis of variance, is defined to be the hypothe- 
sis of “‘no interaction” and a large sample test of this hypothesis in terms of x’ 
is offered. Bartlett’s results [1] for the case of a2 X 2 X 2 table and Norton’s 
results [5] for the case of a2 K 2 X ¢ table formally turn out to be special cases 
of the results given here with these differences: (i) Bartlett’s and Norton’s re- 
sults refer to “analysis of variance” situations, with marginal frequencies along 
at least two ways of the table being fixed, while in this paper, for reasons ex- 
plained elsewhere [7], it is only the total n that is held fixed. (ii) Bartlett’s and 
Norton’s papers do not give any indication of the mechanism behind the for- 
mulae for the hypothesis of ‘‘no interaction,” while this paper attempts to give 
a definite mathematical (and perhaps also physical) mechanism behind the 
formulae. 


2. Preliminaries and the actual construction of the hypothesis of “‘no inter- 
action”. To fix our ideas, consider a sample of fixed size n of independent ob- 
servations distributed in a three-way table. Let n;;, denote the cbserved fre- 
quency, and p; , the probability in the (ijk)th cell, where 7 = 1, 2,---, 71; 


> =) 


j=1,2,---,8;k = 1,2, --- ,t. Also let the marginals be denoted by > Nik = 
NOjk » > 5 Nise = Nik , > Nijk = Nijo, Dons Nijk = Nook, Dia Nijik = Mp, 

jk Nise = Noo, d,i.5% ij = n (say). Let the corresponding summations 
over pijx be denoted by pos, , Pink , Piso, Pook , Pojo, Pio, Poo. Since the cate- 
gories are mutually exclusive and exhaustive, it is easy to see that these are, 
in fact, the marginal probabilities, so that poo. = 1. The generalization to more 
than three variates would be obvious. The likelihood function, which in this 
case is also the probability of the n;,,’s, is given by 


n| ni ny, 
(2.1) o(nin's) = 6 (oay) = TT gai ED, pide ~ UT pi. 


+ i,7,k 
i,7,% 


The last expression on the right side of (2.1) is the one we shall need [2, 4] when 
we are interested in finding the maximum likelihood estimates of the p’s. 








teceived June 7, 1955. (Revised November 10, 1955). 
749 





750 S. N. ROY AND MARVIN A. KASTENBAUM 


Hypothesis of independence between (i, 3) and k, that is, the hypothesis of mul- 
tiple independence. Consider 


(2.2) Ho: pi PinPoxr (fori = 1,2, --- ,r;37 = 1,2,--- ,8;k =1,2,---,0), 


—: ’ ’ - 


the alternative being, of course, H ~ H,. It is easy to check, by summing over 
i and j respectively, that (2.2) implies 


(2.3 Piok = PiooPoor and Pojk = PojoPoox - 
Summing over k we have merely the consistency condition 

») 

(2.4 Pin = Pin- 


Notice that although (2.2) implies (2.3), the condition (2.3) will not, in general, 
imply (2.2). However, for a normal population, (2.3) implies (2.2), Let us ask 
ourselves what set of conditions is there which, when superimposed on (2.3), will, 
together, be exactly equivalent to (2.2). One possible set might appear to be 


H “p , Pi 0Pw Poji 
Pi00 Pojo Poo 


(fort = 1,2, --+ , rs i a bm 220 +-~ 8). 


+“, 9 ?> 


Check that (2.5) does not imply (2.2), but if on (2.5) we superimpose (2.3), we 
have (2.2) all right. But (2.5) would be mathematically most difficult to handle, 
in that the parameters on the right side of this equation are subjeci to sets of 
side conditions, typical among them being 


(2.6) LD Pe = pe = DL LEO 


La 
k Pio Pojo Poor 


> naar = Pioo Pojo 
Ve 


and other such sets obtained by permuting the subscripts. In fact, (2.5) was 
tried and was found to be intractable. 

Physically a less natural and more abstract, but mathematically a much 
easier, set of conditions seems to be 


Hepa, = CAlee 
7 i00 Jojo Jo 


(§=1,2,-+-,r; j= 1,2,-+- 48; ,% ++>.8 


99 


where we do not assume that q;;0 pio, etc., nor even that qino D5 Viz» 
etc. Equation (2.7), after elimination of the q’s, leads to a number of constraints 
on the p’s themselves, and it is easier to try to estimate the p’s subject to these 
constraints and to > ar Dis = 1, rather than to try to estimate the q’s. The 
only role of the q’s and of the hypothesis (2.7) is one of yielding certain con- 








HYPOTHESIS OF NO “INTERACTION” 751 


straints on the p’s themselves. It will be shown in Sections 3 and 4 that (2.7 
is equivalent to just (r — 1)(s — 1)(t — 1) constraints on the p;,.’s, which, to- 
gether with }°;, 5.2 pis 1, make just (r — 1)(s — 1)(¢ — 1) + 1 constraints. 
Notice that in this case we do not have constraints like (2.6) which, in practice, 
turn out to be quite awkward. 

It is clear that so far as the functional form is concerned we can, without any 
loss of generality, replace (2.7) by just 


(2.8) Ho: pig 7 i 09 v0KG0j (a _ 1 2, °*° 9 tt | 


> =—9 

Turorem. (2.3) M (2.7) or (2.8) = (2.2). 

Proor. A straightforward proof, in which everything is spelled out, is given 
for the case of a2 X 2 X 2 table on pages 71 and 72 of reference [3]. A similar 
proof has been obtained for the general r X s X ¢ table and will be shortly pub 
lished. The following, however, is another proof based on a general type of 
reasoning. 

Starting from (2.8) and summing over j and 7 and using (2.3) we have, re- 
spectively, 


P ik P iooPoos >» i500; 


(2.10) Po Po joPook Joy) > Ji’ j07i'o 


Substituting in (2.9) for qo; from (2.10), we have 


(2.1 ] } ae ~— a lq 0 Pojo Pooos aa, qY Girt 


q OF i’ 


So far as the functional form is concerned, this equation, without any loss of 
generality, can be replaced by 


| ~ ae 
(2.12) A E / X qi’ sav | 


J ios ) 
Now suppose that we regard (2.12) as a set of equations in gin (with 
i= 1,2,---,randk = 1,2, --- , ¢) in which q;;o’s act as a set of given parame- 
ters, then (i) it is clear from (2.12) that any gio will depend on the whole set 
of g:jo’s, the form of this dependence varying possibly with z but obviously not 
with k; (ii) if gin (with i = 1, 2,---,randk = 1, 2,--- , ) is a solution set, 
fs(k)qinx is also a solution set, where f;(k) is any function of ‘k’ alone. Together, 
(i) and (ii) show that so far as the functional form is concerned 
(2.13) 7 ios Fila) fk) 
and likewise 


(2.14) Jo jk fal 7) falk). 


Combining the two we can, without any loss of generality, write 


(2.15) 


Disk = S(k)qiso = Goudiz (say). 


aati to 





Re 


752 5S. N. ROY AND MARVIN A. KASTENBAUM 


Summing over k and over (7, 7) we have, respectively, 


(2.16) piso = qijod.e Gon = Qing” (say) and pom = gong” (say) 


Summing up over any one of these two sets of relations, we have 


(2.17) 1 = gq”. 


Substituting back from (2.16) and (2.17) in (2.15), we have 
(2.18) Piik PijoPook - 


Notice that if in (2.7) we were to replace (7, 7, k) by (a, y, z), then (2.7) would 
be found to imply 


(2.19) fla, y,2) = 2% Wh, Dfaly, 2) 
F(x) F2(y) F(z) 

with nothing else connecting f; , fe, fz, Fi, Fe, Fs; among themselves or with f. 

The hypothesis (2.2) is the natural analogue of the hypothesis of ‘‘no mul- 
tiple correlation’ between (7, 7) and k, while (2.3) is the natural analogue of the 
hypotheses of no correlation between 7 and k and between j and k. Thus the 
hypothesis (2.7) is, as it were, a kind of bridge over the gap between (2.3) and 
(2.2). By a certain analogy with “normal variate” analysis of variance we can 
call it the hypothesis of “no interaction’ between 7 and 7. “Normal variate” 
multivariate analysis doesn’t have any concept like this, because there this 
situation does not arise. 


3. A large sample test of (2.7) in terms of x? [2, 6]. We start from (2.1) and 
maximize ¢ with respect to pix’s subject to >>:,;,e Pix = 1 and also subject to 
the constraints that we would get by eliminating the g’s between the equations 
(2.7). We [2] end up with a number of solutions of the maximum likelihood equa- 
tions subject to constraints, but among these solutions there is one and only one 
solution set, say p;;’s having under (2.7) the property (i) that in large samples 
(pix’s) — (true pi,’s) in probability and (ii) that S052 (nin — npin)’/npin 
has approximately the x*-distribution with degrees of freedom equal to the num- 
ber of constraints on the p’s that arise by eliminating g’s among (2.7). To fix 
our ideas we shall first consider the case of a 2 K 2 X 2 table and then the 
general r X s X t table. 

“No interaction” ina 2 XK 2 X 2 table. Consider in this case the hypothesis 
2.7), and write it out in full as follows: 


: 110 Y101 Your 210 Y201 Jou 
Ho: Pi ———— ° a, 
100 Yo10 Joo1 200 Yo10 Yoo1 
quo: 102 Jo12 Q210 J202 Jor2 


A112 ’ ’ 
me 100 Yo10 Yoo2 200 Jo10 Joor 


om = 120 J101 Yo21 220 J201 Yo21 
121 ee ie py pas , 
{100 Jo20 Joon 200 Yo20 Joo1 
120 J102 Jo22 220 J202 Yo22 


P12 ’ . 
100 Yo20 Joo2 200 Jo20 Joc2 





HYPOTHESIS OF NO “INTERACTION”’ 753 


It is easy to check that by eliminating the g’s, we have what we will call the 
‘‘no interaction” constraints, which in this case represent just one relation among 
Io r 
the p’s, namely, 
(3.2 Pi P22 — Pus P22 
Pou P21 P212 Pi2z2 
This is Bartlett’s hypothesis of “no interaction” discussed in [1]. 
There is, of course, the other side condition on the p’s: 


(3.3) ar Disk =a 1. 


Recalling again Section 2, the likelihood function can be written as 
(3.4) o~ IT ise . 
t.2, 


The problem is to estimate the p;;’s by maximizing ¢ subject to the constraints 
(3.2) and (3.3). Introducing the usual Lagrangian multipliers on (3.2) and (3.3), 
we have the maximum likelihood equations 

Nijk 


+> 44 =0 — (ijk = 111, 221, 212, 122), 
Piik Prik 


mie X10 ~~ (ijk = 112, 222, 211, 121). 
Diik Diik 

Now multiplying by p,, , and summing over i, 7, k and using (3.3), we have 

u = —n,and 

Pink = + (ni jx + d)/n (ajk = 111, 221, 212, 122), 


Din = +(nin — d)/n (ijk = 112, 222, 211, 121). 


(3.6) 


Substituting in (8.2), we have for \ the cubic equation 


(3.7) (nin + d)(mon + A) ” (nus — d)(nx2 — d) 
ra (nou — A)(min — A) (naz + A) (nize + A) 





There is one and only one root [4] of this equation which will yield an estimate 
that tends in probability to the true population parameter point and lead to a 
x’-distribution. It will be shown in a later paper that the numerically smallest 
(real) root of (3.7) is the one which will satisfy this condition. 

Solving for \ and substituting in (3.6), we have the estimated p;,,’s occurring 
in the usual x”. Since 


Nijk — NMPin = —d (ijk = 111, 221, 212, 122), 
Nin — MPin = +d (ijk = 112, 222, 211, 121), 


(3.8) 


the final x’ is given by 


(3.9) x = ijn. 


2 
U i,j heel 





754 5S. N. ROY AND MARVIN A. KASTENBAUM 


This will be a x° with d.f. = the total number of cells (8 here) — [the apparent 
number of parameters (8 here) — the number of ‘‘no interaction’’ constraints 
(1 here) — the number of linear relations on the p’s coming from the linear con- 
straints on the n’s (1 here)] — [the number of linearrelations on the n’s (1 here)] = 
the number of ‘‘no interaction’’ constraints = 1, in this case. It was shown in 
[4] that, in all cases, no matter if 7, 7, and / are all “‘variates,” or if some are 
“variates” and some are ‘“‘ways of classification,”’ or if all are “ways of classifica- 
tion,”’ we are going to end up with a x” with d.f. exactly equal to the number of 
‘no interaction” constraints like those of (3.2). 

Notice that in (3.5), the Lagrangian u goes with the constraint se Dij | 
which stems from > ar Nis = n, and the Lagrangian \ goes with the ‘‘no inter- 
action” constraints (3.2). 


4. “No interactions” in an r X s X ¢ table. Let us consider here the hypothe- 
sis of ‘no interaction,” and try to eliminate the q’s. To fix our ideas, consider 
first the case of a2 X 2 X ¢ table. Looking into the mechanics by which (3.2) 
is obtained from (3.1), it is easy to see that, corresponding to (3.2), we are 
going to have 


(4.1) Pre Pore x P11, e—1 P22, -1 _ P11,t—2 P22, 2 Pinu P221 


P2it Prize P21 t—1 Pi2,t—1 P21 ,t-2 Pi2 ,t-2 Pou Pier 


For a general r X s X ¢ table we can figure out that we are going to have the 
following ‘‘no interaction constraints’’ : 


(4.2) Prot Dist _ Prat Bis 
: Dist Prit Prek Prik 


This gives us (¢ — 1)(s — 1)(r — 1) constraints on the p;,’s. Checking the me- 
chanics of the derivation of (4.2) from (4.1), it will be seen that (4.2) yields a 
set of independent and exhaustive relations among the p’s by eliminating the 
q’s from (2.7). Here ps: is, as it were, a pivotal element, and r, s, and t the pivotal 
subscripts. We can make any other three subscripts the pivotal ones, and thus 
obtain another set of independent and exhaustive relations like in (4.2), which 
would be exactly equivalent to (4.2), and so on. 
Our likelihood function is 


(4.3) o~ II piv. 


Here we have to maximize (4.3) subject to the “no interaction” constraints 
(4.2), and the further constraint 


(4.4) Disk Die = 1. 


Introducing for (4.1) the Langrangian multipliers \, [7 1,2,---,(r—1); 


9 > 


j=1,2,---,(@— 1);k 1,2,---, (t — 1)], and for (4.4) the Lagrangian 





HYPOTHESIS OF NO “INTERACTION” 


multiplier u, and maximizing (4.3), we have for p; the typical equations 


(r—1 1) (¢t (s—1) (t—1) 


» e iit ie ae Se 


=1 2 k= ' Vist j=l kenl 


Dist Dist 


r—l) (t 


Ibesk 


Prsk 


( 1) 
y Aije 


4. fat 
Disk Disk 


Nisk 


Vijk Aije 


Pr h jk Dijk Piji 


with, of course, 1 = 1, 2,---, ( — 1);9 = 1, 2,---, (8 — 1); k& = 
1, 2,---,(t— 1). Notice that with the pivotal subscripts (rst) goes a triple 
summation over the \’s and a positive sign before that expression; with just 
one subscript changed goes a double summation over the \’s and a negative 
sign before that expression; with two of the subscripts changed goes a single 
summation over the \’s and a positive sign before that expression; and finally 
with all the subscripts changed, we have a single \;;, with a negative sign be- 
fore it. 

As in the case of the 2 X 2 X 2, it is easy to see by multiplying both sides 
of (4.5) by pi and summing over i, j, k, that «1 = —n. Thus solving for the 
Pij’8 in terms of the n;j’s and A;j’s, and substituting in the “‘no interaction” 
constraints (4.2), we have for \,; the following equations [for 7 = 1, 2, 

(r — 1); 1,2,---,(8 —1);k =1,2,---,(@-—1)): 


(4 10 (Myst + brat) (Mije + Mist) a (Myst ory break) (Mijn — pijr) 


(nist _ Mist) (Mrjt _ Mrjt) (Nisk + Misk) (Mrjx + Mrjk) : 


where 4,,, stands for the triple summation expression in (4.5), wise, Urit , Brak fOr 
the double summation expressions in (4.5), wise, Misk , Mrjx for single summation 
expressions in (4.5), and uj, is simply A; . As observed in connection with (3.7), 
here also there is one and only one solution [4] of this equation which will yield 
an estimate that tends in probability to the true population parameter point 
and that will lead to a x’-distribution; it will be shown in a later paper that 
the (real) solution for which the distance from the origin in the space of yu;;’s 
is the least is the solution leading to a x’ distribution. Solving equations (4.10) 
for the u;,.’s, and ultimately for the \,,,’s, in terms of the n;’s, we can find the 
pin’s. Substituting these values in the usual expression for x” we have 


(4.11) Doin 5 ntti in/ (Mijn + nisnbisn), 





756 S. N. ROY AND MARVIN A. KASTENBAUM 


where i = 1,2,--- ,r;57 = 1,2,-°--,8;k = +++, t; and where 

nik = +1 if ijk = rst (the pivotal subscripts); 

nijzx = —1 if any one subscript differs from the corresponding pivotal subscript; 
nik = +1 if any two subscripts differ from the corresponding pivotal subscripts; 


nije = —1 if all subscripts differ from the corresponding pivotal subscripts. 


Using [2.5] it will be seen that the statistic (4.11) will be distributed as a x° with 
d.f. = number of “no interaction” constraints on the p’s = (r — 1)(s — 1)(t — 1). 
For several types of data on effects of exposure to radiation, the equations 
(4.10) and similar equations for some four-way tables have been solved on an 
electronic computor at Ann Arbor, Michigan, using the method of steepest de- 
scent supplemented by some numerical graphical procedures. The details and 
the final results for the actual data handled will be reported in a later paper. 


5. Concluding remarks. The lines along which the concept and structure of 
the hypothesis of ‘‘no interaction” is to be generalized to multi-way contingency 
tables of higher dimensions can now be indicated. For example, in a four-way 
table the hypothesis analogous to (2.7), that is, the hypothesis of no “‘second- 
order interaction’”’ seems to be 


(5.1) Ho: Pini = Jijko Jijot Jiok t Yojkt Yioo0 Yoj00 Yooko Yooo: 
, i j00 J i0ko Jin Yojko Yojor Yoox: 


In this case the hypotheses of four separate ‘‘no first-order interactions’’ follow 
exactly the same pattern as in Section 4, and need not be separately considered. 
The extension of (5.1) to higher-order ‘“‘no interactions,” in the case of tables 
of higher dimensions, forms a certain pattern which has been worked out and 
which will be discussed in a later paper. The technique of testing (5.1) and “no 
interaction” hypotheses of higher order is essentially similar, in principle, to 
what has been discussed in Section 4. The details alone are more complicated. 
For higher-order ‘‘no interactions” there are, however, various intermediate 
cases of considerable interest which will be discussed later. 

Going back to the three-way r X s X ¢ table again, it may be remarked [4] 
that we have an asymptotically equivalent test if we plug into the x°-statistic 
any B. A. N. estimate of the pij,’s (consistent with (4.2) and (4.4)), and not 
just the maximum likelihood estimate of the p;,’s subject to (4.2) and (4.4). 
In particular, we can estimate [4] the p;,’s by minimizing the modified x” (some- 
times called the xj) subject to (4.2) and (4.4). However, if we use the x/-statistic 
for estimation, a much better procedure would be (i) to define the “no inter- 
action” condition as a “linearized” counterpart of (4.2); (ii) to estimate the 
Pinx’s by minimizing x; subject to (4.4) and the ‘linearized’ counterpart of (4.2); 
and (iii) to plug these estimates into xj itself and use the xj-test, which is the 
same as the x’-test. This has been done in [3] and the material will be offered 
shortly for publication. Notice that (4.2) itself is a logarithmic linear hypothesis 
of the nature of a set of contrasts. 





HYPOTHESIS OF NO “INTERACTION” 157 


This paper, unlike most previous work [1, 5, 6], discusses the hypothesis of 
“no interaction” in relation to the multivariate analysis situation only—that is, 
where it is only the total m that is fixed and no marginal frequencies. For analy- 
sis of variance situations—that is, when marginals along one or more directions 
are fixed—the authors do not find the ‘“‘no interaction” concept too meaningful 
[7]. Nevertheless, starting more or less from the conditional probability set up 
(which can be justified for the specific cases considered) the authors have dis- 
cussed in a previous paper [6] the “no interaction” hypothesis and its tests for 
the analysis of variance situations, too—that is, where the marginal frequencies 
are fixed along one or more directions of the multiway table. The formal struc- 
ture of the hypothesis and the formal analysis remain the same as for the multi- 
variate analysis situation discussed in this paper. 

We give below a few references which have a direct bearing on this paper. 


REFERENCES 

[1] M. 8. Bartuert, ‘Contingency table interactions,” J. Roy. Stat. Soc., Suppl., Vol. 2 
(1935), pp. 248-252. 

{2] H. Cramer, Mathematical Methods of Statistics, Princeton University Press, 1946, 
Chap. 30. 

[3] S. K. Mirra, ‘Contributions to the statistical analysis of categorical data,’’ Institute 
of Statistics, University of North Carolina, Mimeograph Series No. 142, 1955. 

1] J. Neyman, “Contribution to the theory of the x?-test,’’ Proceedings of the Berkeley 
Symposium on Mathematical Statistics and Probability, University of California 
Press, 1949. 

W.H. Norton, ‘‘Calculation of chi-square from complex contingency tables,’’ J. Am. 
Siat. Assn., Vol. 40 (1945), pp. 251-258. 

[6] S. N. Roy anp M. Kasrensaom, ‘A generalization of analysis of variance and multi- 
variate analysis to data based on frequencies in qualitative categories or class 
intervals,’’ Institute of Statistics, University of North Carolina, Mimeographed 
series No. 131, 1955. 

[7] S. N. Roy ann 8. K. Mirra, “‘An introduction to some nonparametric generalizations of 
analysis of variance and multivariate analysis,’ Institute of Statistics, Uni- 
versity of North Carolina, Mimeograph series No. 139, 1955. 








THE BIAS IN CERTAIN ESTIMATES OF THE PARAMETERS OF THE 
EXTREME-VALUE DISTRIBUTION 


By Braprorp F. KIMBALL 


New York State Public Service Commission 


Summary. This paper is mostly concerned with a modification of the maximum- 
likelihood estimate of the scale parameter of the extreme-value distribution for 
which the bias can be explicitly obtained. A formula for computing this bias is 
derived, and bias factors are tabulated for sample sizes from n = 2 ton = 112. 
A brief comparison is made between this estimator and the optimum linear 
estimator for a sample of size n = 6. Attention is called to a bias which results 
from the maximum-likelihood estimate of the second parameter, and formulas 
for the bias and the variance of this estimate are obtained. In the concluding 
section, the significance of certain aspects of the maximum-likelihood estimate 
of the scale parameter in practical applications is briefly discussed. 


1. The equations for the maximum-likelihood estimate of the parameters. 
Given the extreme-value distribution of Type I [2] 


(1.1) &(y) = exp (—e ”), 


where ® denotes the cdf, the equations for the maximum-likelihood estimate 
of the parameters may be reduced to [6] 


(1.2) Se Le 
a 7a _ 


(1.3) e™ => e™/n, 


where the summations are taken over the n values of x in the sample. The first 
of these equations, although involving only the one unknown, a, is somewhat 
intractable of solution. A variation on such a solution may be accomplished 
as follows: 


2. Modification of the maximum-likelihood estimate of a. Substituting 
equation (1.3) in (1.2) one can write 


1 ge eat 
(2.1) a 2e~ — 


a n 


One may further note that from (1.1), using “log” to represent natural 
logarithm, 


—log ®. 


teceived June 27, 1955 





BIAS IN CERTAIN ESTIMATES 759 


Introducing the subscript 7 to denote a specific sample value, (2.1) becomes 


ai z+ 7 x; log , 
a n 

So far this represents a transformation of the maximum-likelihood relations 
and hence a form of the maximum-likelihood estimate of the parameters. At 
this point it will be of interest to investigate the possible bias that would be 
incurred if 1/a were estimated from the right-hand side of the last equation 
under the assumption that the value of log ®; were the true value for each z: 
(based on the true values of the population parameters). We denote such an 
estimate by (1/a)9. Thus 


(2.2) (1a) = 2+ Ve tloeh _ 4 Dive 


i n 


, 


and we note that the values of a and u which appear explicitly or implicitly on 
the right are the population parameters and not maximum-likelihood estimates 
of those parameters. 

One seeks to evaluate /[(1/a)o]. From the well-known relation 

E\g| = Ela(& — u)] = y = Euler’s constant (=0.577216), 

one obtains 
(2.3) E\z| = u + y/a. 
It is also known that (see [6], p. 111) 
(2.4) EDD (a, — uje ** > = —n(l — y)/a 
and that 


It follows that 

(2.5) ELDo (ase wee") /nj) = —( — y)/a + u. 
Combining (2.3) and (2.5) in (2.2), we have 

(2.6) E\(1/a)o] = 1/a. 


A modification of the relation (2.2) is proposed by substitution of E{log ®,] 
for log #; . For any well-behaved cdf it is known that 

s . l l l 
(2.7) E[—log ®,.) = — + ——— + ---+-= (n+ 1) — Wm), 

m m+ 1 n 

where #,, denotes the cumulative distribution function corresponding to the 
mth-ordered sample value proceeding from the smallest x = 2, to the largest 
x = 2,, and where y(z) is the logarithmic derivative of the gamma function.’ 


! This is a tabulated function; e.g., H. T. Davis, Tables of Higher Mathematical Functions, 
Vol. 1. Principia Press, Bloomington, Indiana, 1933. 





760 BRADFORD F. KIMBALL 


There is undoubtedly some loss in efficiency in making the above substitution. 
This at present we have not found it possible to measure. The bias introduced 
by this substitution can be measured, and knowledge of it is essential. 


3. General equation for measuring the bias. Denoting E[log ©,,] by (log ®,) 
and replacing 1/a by 8, the proposed equation for the estimate of 1/a is 


(3.1) B= #+)>  2n(log®,)/n. 


Since (2.2) does not produce a bias, the bias incurred by the estimate £ is given 


by 
(3.2) E(8 — 1/a] = Ef>> zn((log,,) — log ®,,)/n]. 
Note that 
(3.3) E[u>, (log ,,)] — Elu>> log %,,.] = 0. 
Subtracting this from (3.2) and noting definition of y in (1.1), 
(3.4) ak[f — 1/a] = ED yn(log%,)/n] — EIS ym log &,/n]. 
From (2.4) 

ED. ym(--log ®,)/n] = —(1 — ¥), 
and with 

Elym] = Gm 

the general formula for the bias of B is given by 
(3.5) ak[8 — 1/a] = —(1 — y) — Do Gn(—log ®,)/n, 
where 


1 1 l 
i aes ie aoe ny 
(—log $n) = — + s+ +=, m 


and index m = 1 corresponds to smallest extreme and m = n to the largest, with 
y representing Euler’s constant 0.577216. 


4. Reduction of general formula for bias. Two formulas for the computation 
of 7» exist in the literature. One was published by the author in 1947 (5): 


(4.1) Ym =¥t+ a (—1)'C?a‘ log (n — 1), r=n—™M™, 
t= 


where A‘ represents forward difference of ith order. Another formula for 7,, was 
derived by Lieblein and published in 1953 [7]: 


(4.2) jm = mC, > (—1)'Cig(m + 2), g(z) = (y + logz)/z, r=n—~m. 
i= 


Although for computational purposes there is little difference between the 





Ls ee 


BIAS IN CERTAIN ESTIMATES 761 


two formulas, the author found that the reduction in question proceeds more 
directly from the substitution of (4.2) into (3.5). After considerable maneuver- 
ing, one arrives at 


(43) as Gm(log | Pn) tibia yar *( log r i 


n r=a=2 


rir — 1) 


where A“, refers to the kth-order difference with unit interval, proceeding in 
the negative sense. This, in turn, can be expressed as 


(4.4) 2. im —log &n) =y- + — e(-1 )*""'A* log 1. 

n rand ac — 1) doo 
Here the differences are taken in the positive sense. This formula is better suited 
to computation than the preceding one when differences of the logarithm are 
already available. A further reduction gives a form which is better suited to 
theoretical evaluation, as will be seen in the next section. This is 


2 7i(— log #;) = r+1 A’ lo 1, 1 7 
(4.5) D I: \— 108 Pi? = ¥Y —_ >» (—1) as = (—1)’ x log l . 
n rool r n Fal 

5. On the convergence of the above series. From the general theory of a 
distribution function of an ordered sample value, it can be proved that as 


n > 


E( Sd. ys(log #:)/n] — El. y: log &;/n] > 0 


if ® is continuous. Accordingly, from (3.5), the expressions on the right of equa- 
tions (4.3) to (4.5) should approach y — 1 as n— o. Set 


n—1l r+1,fr zs 
(5.1) R,(z) = >, (—1)'" 4" log z 


1 r 


By expanding a function f(z + ¢) in a series of differences about ¢ = 0 and dif- 
ferentiating as to t, it is easily verified that 


Ko a r ) 
(5.2) f'(z) + w(x) = > (-1) ‘oe, 
1 


where w(x) is periodic with period unity. Thus 


(5.3) 1/x + w(x) = R(x), z>0. 


2 An alternative reduction has been suggested by a referee. This involves writing out the 
differences A‘ log (n — 7) in (4.1) as linear functions of log (n — 7) and multiplying by the 





expression for E[—log %,] given in (2.7). He obtains 
1 * 1 - 1 (— a 
a) Jm(—log On) = y¥ + Cia log(i + 1). 
nmi n i=1 


This becomes identical with (4.4) when expressed in terms of differences of the logarithm. 
It is believed that the form (4.4) is more convenient for purposes of computation because 
of the relation (6.6). 


re aid 





762 BRADFORD F. KIMBALL 


It is known that [11] 


(5.4) lim,., ” logn| A” loga| = I(x 


| 


and that for finite n [3] 


(5.5) A" loga | < _ . : «> 
a(x + 1)--: (x +n) 


Hence the infinite series R(x) converges for positive values of x uniformly for 
x = x% > 0. Accordingly, 

lim... He (x) = 0. 
It follows that 


We have then 
R, 
and 


Ri) = l. 


It is easily proved from (5.4) and (5.5) that the last series in (4.5) approaches 
zero as n — &. This completes the proof of the convergence of the sum of the 
two series on the right of (4.5) to value unity. It follows that the series on the 
right of (4.3) and (4.4) also converge to value unity as n becomes infinite. 

6. Tabulation of the bias factor. For purposes of computing a table of bias 
factors, the formula (4.4) has been used, since it was possible to obtain from the 
National Bureau of Standards a table of A’ log 1 from 7 = 1 toz = 111. For 
purposes of reference we refer to the series on the right of (4.4) as S(n). Substi- 
tuting (4.4) into (3.5), the general equation for the bias reduces to 


(6.1) Eig — 1/a] = —1/a + S(n)/a 


or 


(6.2) E{] = S(n)/a. 


In tabulating the bias for ready use it seems preferable to tabulate a factor 
b, such that 


(6.3) l/a > b, E(B); 
hence the bias factor b,, is given by 


Vn 


(6.4) b, = 1/S{n), 


where 


" r—l 


(6.5) S(n) = > ——_. F (1) "4a! log 1. 


rao T(r — 1) jon 


As a computational matter it is to be noted that 


n—1 
(6.6) S(n) — S(n — 1) = — eta D > (—1)'7'A' log 1. 
J 1 


n(n — 


The author has computed the series to seven decimal places, using the tabu- 
lated values of A‘ log 1 furnished by the National Bureau of Standards. Through 





BIAS IN CERTAIN ESTIMATES 763 
n = 21, values of S(n) were checked by alternate computation, using the series 
on the right of (4.5) Values of S(n) agreed to within two units in the seventh 
decimal place. Beyond that point computations of S(n) were checked by repeti 
tion and by comparing cumulative adding-machine tapes. In this way S(n) 
was tabulated ton = 112, accuracy being guaranteed to the sixth decimal place, 
subject, of course, to the accuracy of values of A‘ log 1 furnished by the Bureau 
of Standards 

A table of the reciprocals of S(n) carried to the nearest fourth decimal place, 
and giving the bias factor b, , is shown below. 


TABLE OF BIAS FACTOR 


Estimate of 1/a = b,B,n = size of sample 


n . n bn n n bn 

1 31 1.0743 61 1.0388 9] 1.0265 
9 2 8854 32 1.0720 62 1.0382 92 1.0263 
3 1.9606 33 1.0699 63 1.0376 93 1.0260 
4 1.6503 34 1.0679 64 1.0371 94 1.0257 
5 1.4941 35 1.0661 65 1.0365 95 1.0255 
6 1.3997 36 1.0643 66 1.0360 %6 1.0252 
7 1.3363 37 1.0626 67 1.0355 97 1.0250 
bal 1.2907 38 1.0610 68 1.0350 OS 1.0247 
9 1.2563 39 1.0595 69 1.0345 99 1.0245 
10 1.2294 10 1.0581 70 1.0340 100 1.0243 
1] 1.2078 41 1.0567 71 1.0336 101 1.0240 
2 1.1900 42 1.0555 72 1.0331 102 1.0238 
13 1.1751 13 1.0542 73 1.0327 103 1.0236 
14 1.1625 44 1.0530 74 1.0323 104 1.0234 
15 1.1516 15 1.0519 75 1.0319 105 1.0232 
16 1.1421 16 1.0508 76 1.0315 | 106 1.0229 
17 1.1337 47 1.0498 77 1.0311 107 1.0227 
18 1.1264 18 1.0488 | 78 1.0307 108 1.0225 
19 1.1198 49 1.0178 | 79 1.0303 | 109 | 1.0223 
20 1.1139 50 1.0169 | 80 1.0300 | 110 | 1.0222 
21 1.1085 51 1.0460 | 81 1.0296 111 1.0220 
22 1.1037 52 1.0452 | 82 1.0293 | 112 1.0218 
23 1.0993 53 1.0144 | 83 1.0289 | 

24 1.0952 54 1.0436 | 84 1.0286 | 

25 1.0915 55 1.0428 || 85 1.0283 | 

26 1.0881 56 1.0421 i 86 1.0280 
27 1.0849 57 1.0414 | 87 1.0277 

28 1.0820 5S 1.0407 SS 1.0274 
29 1.0792 59 1.0401 89 1.0271 
30 1.0767 60 1.0394 0 1.0268 








764 BRADFORD F. KIMBALL 


7. Application to Type II extreme-value distribution. In certain problems 
where a finite lower limit bounds the observed variable, the Type II or Type 
III extreme-value distribution is found more descriptive of the practical situa- 
tion (e.g., application to breaking strength of materials [1], maximum wind 
speeds [10], etc.). Since the treatment is the same except for sign, we examine 
only the case of the Type II distribution. This distribution is usually given in 
the following form (see formula (3.19) of [2}): 


(7.1) WY = exp [—(z/b) “], s> 0, 


where Y is the cdf of the variable z, and a and b are parameters. Taking loga- 
rithms, this is the same as 


= o)\ 
lm 


( ) —log (—log¥ ) = a(log z — log bd). 
Thus if we set 

(7.3) x = logz and u log b, 
the distribution (7.1) becomes identical with (1.1), with 


(7.4) y = a(x — u) = allog z — log b), 


and the parameter a is the same in the two distributions. 
Hence the estimation of the parameter a, or its reciprocal 8, may proceed by 
setting 


(7.5) Li log z; 


and applying equation (3.1) for estimate of 8 followed by use of the tabulated 
bias factor. 


8. Comparison of estimator 3 with that of optimum weighting for sample of 
size six. The estimator 3 defined by (3.1) is a linear function of the sample 


values z,, , With coefficients given by 
(1 + (log ®,,))/n, > Cc, = 0. 


In a recent monograph, Lieblein [8] has developed a linear unbiased estimator 
of 8 in which the coefficients are determined so that the variance of the estimate 
is @ minimum. The “optimum”’ coefficients or “weights” have been determined 
explicitly for samples as large as six. For larger samples, a grouping procedure is 
recommended. Specific optimum weights for each ordered value of larger samples 
are not available. 

It will be of some interest to compare the series of weights of the two linea 
estimators for a sample of size n = 6. For the estimator 8 of (3.1), the bias 
factor for n = 6 is found from the table to be bs = 1.3997. Thus the unbiased 
estimate of 1/a is bg8 with coefficients bec,, , which turn out to be 


Ww, = —.3383, we = —.1050, w, = .0117, 


uy = .0894, ws, = 1477, We = 1944. 








BIAS IN CERTAIN ESTIMATES 765 


The optimum weights found by Lieblein for a sample of size six are 
w, = —.4593, We = —.0360, w; = .0732, 
w, = .1267, ws = .1495, We = .1458. 


It. is perhaps of some significance that in each case the sum of the weights is 
zero. 


9. Bias of estimate of parameter u from the maximum-likelihood equations. 
A curious fact about the relationship of the parameter u to the maximum- 
likelihood equations is that if u be estimated from the maximum-likelihood equa- 
tions under the assumption that the other parameter a be known, a bias results. 
The author discovered this some ten years ago while working on a related paper 
[4]. With the present interest in the extreme-value distribution, it seems worth 
while to bring this out. Evaluation of the bias proceeds as follows: 

Define variable & by 
(9.1) t= e ~S 


and take 


(9.2) E=>e“/n, fH =e™ 


where x; is distributed as in (1.1). 
With a known, wu is to be estimated from (1.3) above. Denoting this estimate 
by @, we have as the equation of estimate 


(9.3) e*= \ie™/n == 
and note from (9.2) that 
(9.4) ee) os Et 


and hence that 

(9.5) a(@ — u) = —log (€/t). 

Thus taking the moment generating function as (see [6}) 
Ge) = Ele M*) = BL(E/E)), 
G’(0) = akla — ul). 


In a previous paper (see formula (4.3) of [4]), the author showed that the 
pdf of = is given by 


P(E) dé = [1/P(n)]e~**!**(nE/to)"*n dE/Eo . 


Hence 

(9.6) G(6) = n'T(n — 6)/T(n) 

and 
iba ’ haut n 1 1 
G'(0) = — I'(n)/I'(n) + logn = y + logn — (1 +z+ i a ———a 





766 BRADFORD F. KIMBALL 


Accordingly, the bias in the maximum-likelihood estimate of u, with a known, ts 
given by 


ee re , / 1 1 
(9.7) Ela — ul) = (1 a) | + log n 7) 3% coe of —)]. 
~ = 
In this connection, note from (9.6) that 
ElE/t] = G(-1) = 1 
and hence that 
(9.8) Ele ”“ = ¢ 


4 


Thus, if, in the Type II distribution, one took the parameters as a and b“, 
and if b “ were estimated from 


(9.9) Estimate of b 


with a known, no bias would result from such estimate. 
It may be of interest in this connection to evaluate the variance of the esti- 
mate @, with parameter a known. From the definition of G(@), it follows that 


G” (0) Ela(t _ u)’}. 
From (9.6) we have 

G” (0) = (log n)G’(0) — (log n)I’(n)/T(n) + T’(n)/T(n). 
This can be reduced to 
(9.10) G”(0) = [G’(0)F + W(n 
where 
(9.11) (x) I’(x)/T (2). 


This latter function is a well-known mathematical function [9]. Its derivative 


when z is an integer is an infinite series of the reciprocals of the squares of the 
natural numbers beginning with z, and ¥/(1) = 2/6. Thus we have the result 
that the variance of the estimate G, with parameter a known, is given by 


(9.12) El(a@ — uy] — (Bias)’ 
(1/a’)[e'/6 — (1 + 1/2° + 1/3 + --- + 1/(n — 1)’}. 


10. General remarks. A fact about the maximum-likelihood estimate of the 
scale parameter 8 = 1/a as it relates to practical problems, which does not seem 
to have been brought out in the literature, is the following: Inspection of equation 
(1.2) or (3.1) shows that in this estimate values of the observed series z; that 
are near the lower extreme are much more heavily weighted than those near 
the upper extreme. From a theoretical point of view this is entirely rational. In 
practice, however, the extreme-value distribution is often used to fit an observed 
series of extremes without any very sound theoretical basis—merely because it 





Se 


BIAS IN CERTAIN ESTIMATES 767 


seems to describe fairly well the behavior of the upper part of the series of 
extremes. It should be pointed out that in such cases the maximum-likelihood 
estimate of 8 may very well be worse than an estimate which gives less weight 
to the lower part of the series. 

For example, the distribution of Type I is sometimes used even when the 
lower limit of the series of observations is fized. In such a case, theory is not 
satisfied at that end of the series and, accordingly, some distortion of the theo- 
retical fit at that end of the series is to be expected. Hence in such a case prefer- 
ence for the maximum-likelihood estimate because of its greater theoretical 
efficiency is questionable. 


Acknowledgment. Acknowledgment is made to the National Bureau of 
Standards for their courtesy in furnishing a tabulation of the differences A‘ log 1 
through 2 Lil. 


REFERENCES 


[1] B. Epstein, ‘“‘Application of the theory of extreme values in fracture problems,” 
J. Amer. Stat. Assn., Vol. 43 (1948), pp. 403-412 

|] E. J. GumBe., Statistical Theory of Extreme Values and Some Practical Applications 
National Bureau of Standards, Applied Mathematics Series 33, U. 8. Govern 
ment Printing Office, Washington, D. C., 1954. 

3] B. F. Kimpauu, ‘‘The application of Bernoulli polynomials of negative order to dif 
ferencing,’’ Amer. J. Math., Vol. 55 (1933), pp. 399-416; formula (6.15) derived 
for odd values of n (see top of p. 414). 

4] B. F. Kimpaut, “Sufficient statistical estimation functions for the parameters of the 
distribution of maximum values,’’ Ann. Math. Stat., Vol. 17 (1946), pp. 299-309. 

[5] B. F. KimBatu, ‘Assignment of frequencies to a completely ordered set of sample 
data,’”’ Trans. Amer. Geophys. Union, Discussion, Vol. 28 (1947), p. 952. 

(6) B. F. Kimsact, “An approximation to the sampling variance of an estimated maximum 
value of given frequency based on fit of doubly exponential distribution of 
maximum values,’’ Ann. Math. Stat., Vol. 20 (1949), pp. 110-113; equations (4) 
and (5) with yo = 0, g = u. 

[7] J. Ligsiern, “On the exact evaluation of the variances and covariances of order 
statistics in samples from the extreme-value distribution,’’ Ann. Math. Stat., 
Vol. 24 (1953), pp. 282-287. 

(8) J. Lispiern, A New Method of Analyzing Extreme-value Data. Technical Note 3053, 
National Advisory Committee for Aeronautics, Washington 25, D. C., January, 
1954. Pp. 88. See p. 16 and Table I, p. 72. 

[9] N. E. N6riunp, Differenzenrechnung, Julius Springer, Berlin (1924), p. 111, formula 
(36), and p. 104, formula (19). 

[10] H. C. S. Tuom, ‘‘Frequency of maximum wind speeds,’’ Proc. Amer. Soc. Civil Engrs., 
Vol. 80 (1954), Separate 539, pp. 11. 

[11] M. Warp, “Solution of Problem 3591,’ Amer. Math. Monthly, Vol. 40 (1933), pp. 614- 
616. 





A QUEUEING SYSTEM WITH x? SERVICE-TIME DISTRIBUTION’ 


By Davip M. G. WisHART 


Department of Statistics, University of Aberdeen, Aberdeen, Scotland 
Formerly of Princeton University 


Summary. A stochastic process associated with a queueing system is specified 
by knowledge of (i) the input, (ii) the queue discipline, and (iii) the service 
mechanism. A system in which the input is of the “general independent” type 
and the service times independent and identically distributed according to an 
arbitrary, general law is given the label GI/G/s, where s is the number of servers 
(see Kendall [4]). An appointment system for arrivals (or regular service times) 
is designated by D (deterministic); M describes random arrivals (or negative- 
exponential service times); and £, (Erlangian) indicates that a scale-modified 
x’ distribution with 2k degrees of freedom governs the input (or service mecha- 
nism). Note that M is equivalent to Z, . 

The following study was suggested by Kendall in order to extend his descrip- 
tion of the system GI/M/s (see [4]) to the system G//E,/s. This service time is 
thought of as the sum of k independent components, identically distributed with 
negative-exponential distributions. The general system GJ/E,/s, however, 
appears currently to be intractable in this form, so that we confine ourselves, 
in this paper, to the system GI/E;,/1. We analyse this with the aid of an em- 
bedded Markov chain deriving the stationary distribution for the number of 
customers in the system at epochs of arrival (equation 1.16) and the distribution 
of the waiting time for an arbitrary customer (equation 1.21). 

Lindley [5] has discussed the problem of the waiting time in the system 
D/E;,/1, solving for this particular example an integral equation governing all 
systems of the type G//G/1: the equivalence of our waiting time distribution is 
demonstrated in Section 2. 

Pollaczek ({6] and [7]) and Smith [8] have also considered systems of this 


kind. 


1. The system G//E,/1. We consider the following queueing system: 

(i) General independent input: i.e., the time intervals between arrivals are 
independent and are identically distributed according to the law dA(u), with 
0 < fo udA(u) =a < ~ and A(0+) = 0. 

(ii) Queue discipline: a single line, and “‘first come, first served.”’ 

(iii) Service mechanism: a single server, who serves each customer inde- 
pendently of previous customers and of the queue length; the service times are 
identically distributed with a scale-modified x’ distribution of mean b and 2k 

Received March 1955. 
1 This work was done under contract with the U.S. Office of Ordnance Research. 
768 





QUEUEING SYSTEM 769 
degrees of freedom. Thus 
—vk/b k—l 
Ik kdv 
11 ie al abe 4) fai 
(1.1) Bo) = (F) 


Following Erlang (see [1]) we can suppose this service time to arise in the follow- 
ing manner. We imagine the service to take place in k consecutive phases; if 
the time spent in the 7th phase is 7; , we assume the 7; to be independent and 
identically distributed according to a negative-exponential distribution with 
mean b/k, so that 


(1.2) Pr (7; > t) = e *"”. 


Then the total service time, 7 = 7; + --- + 7, has the modified x’ distribution 
given above. 

We specify the system by two numbers: g, the total number of customers in 
the system; and p, the phase in which the customer receiving service is found. 
Our sample function is the vector (N(t), n(t)), where N(t) is the total number 
in the system at time ¢, and n(t) the phase in which the customer receiving 
service is found at time ¢. We define n(t) = 0 when N(t) = 0. We take these 
step-functions to be continuous to the right, and, following Kendall [4], we 
consider the statement “A customer has just arrived.” This is equivalent to the 
construction of a set II of epochs ¢ such that N(t) = N(t — 0) + 1. Since we 
disregard multiple arrivals, II is almost certainly denumerable and may be 
strictly ordered: 


(1.3) Tl = {t,;n = 1, 2,3, --- } 


where t, < tas: for all n, and ¢, is an arrival epoch with which observation begins. 
Write X, = {N(t, — 0), n(t, — 0)}; then 


distr {X, | X,, for allm < n} = distr {X, | Xn}, 


so that with this description of the state, the epochs of arrival form the time- 
points of an embedded Markov chain with a denumerable infinity of states. 

Let us number the states in this way: to the state (q, p), if gp ¥ 0, we attach 
the label gx_p4: (i.e., the suffix indicates the number of phases yet to be com- 
pleted by the customer at the service point); g = 0 implies p = 0, so we attach 
the label 0, with no suffix, to the state (0, 0). We consider the matrix of transition 
probabilities, 


P = {pi,;,} (j,j7 = 0,1,2,---;1 Su,» Sk). 


Clearly p:,;, = Oif 7 > 7 + 1; and if 7 = ¢ + 1, then p,,;, = Oif y > uw. To 
evaluate the nonzero elements p;,;, with ij ~ 0, we note first that there are 
n = 4 + 1 — j departures during an arrival interval u. The transition i, — j, 
therefore implies the completion of n service-time intervals distributed according 
to (1.1), or nk intervals distributed according to (1.2): in the transition 7, — j,, 
when » + »v the number of departures is unaffected, but the number of negative 


rr RS 





770 DAVID M. G. WISHART 


exponential time intervals is increased by u — v (which may of course be nega 
tive). We consider therefore the possibility that the sum of nk + mu — » inde- 
pendent time intervals distributed according to (1.2) is less than or equal to u, 
whereas the sum of nk + uw — v + 1 such intervals is greater than w. 

Put 


Ss = a Tm y 


m= 1 


and 
(1.4) (7, | 2, uw; wu) 


Then 


(15) Digs, = | (j,v | 1,4; u) dA(u) 
“0 
We are concerned here with a special instance of the following theorem: 
Given two positive-valued random variables X and Y, independent and distributed 
according to F and G respectively, then 


-u 


Pr(X 5 u,X+Y>u) = {1 — G(u — z)] dF(z). 


Jo 
< u,x+y> u}; then from the assumption of independence, 


Pr(ZE) = Pri s u) — Prfa#+y Ss u) 


= F(u) — | Gu — 2) dF(2) 


“0 


~u 


=| [1 — Gu — 2)] dF(). 
Jo 
Here we have G(z) — ¢**"” and dF(zx) e**!"/(r — 1)! (kx/b) k dx/b, 
so that 


(1.6) Pr(E) = SEU/b) tw 

r! 
(This may also be seen directly: the completion of r negative exponential service- 
time intervals is equivalent to the occurrence of exactly r events in a fictitious 
Poisson process with intensity k/b). 

We have, therefore, 


(1.5’) Di,i, = | (ku/b) e**!” dA(u), 
Jo r! 


where r = k(i + 1 — 3) + uw — v as above, and for future simplicity we will 
abbreviate this integral to 7, . The transitions 0 — 1, have probabilities 


Pu, = | Pr(Sp. S u, Seous > u) dA(u) = m-. 


“0 





QUEUEING SYSTEM 771 


Finally, to calculate the transition probabilities p;,o , we see that these are given 
by Pr(Sii4, S u) integrated over u: 
2 £ wku/lb kitp—l1 


) 
Pi,o = | < —_——_——_—_— e dx») dA(u), 


a 


Po = | Pr(S, 


IA 


u) dA(u). 


It is easy to check that the row sums are equal to one. We therefore have the 
matrix P of transition probabilities: 


j=0 j=1 j=2 
] ote. otek b see giesalk I ces 


v= 0 |Poo Mk-1 *** Mk» *** No 








1) Pi,o ™ — ™ Mo 
; _ l | P1,0 ees Nk+p—» cee cee No 
k Pi,0 = N2k-1 Se Nk Nhk-1 No 
1= 2 1| peo Mx 4 M+ Nk m1 «0 
The quantities n,.(n = 0, 1, 2,--- ) form a probability vector with generating 


function F(z) given by 


F(z) = Do m2” = | bP E (=) Jem * dA(u) 
7 A b 


n=O /0 n=O n! 
(1.8) r 

a kl —z ' 

. | exp \-— a" | data) 

/0 b 
and 
(1.9) F’(1 — 0) = ka/b. 
We define a parameter p = b/a, the relative traffic intensity. As usual we as 


sume that p < 1, and we shall see that the chain is ergodic. 

The matrix is irreducible, since every state can be reached from every othe: 
state in a finite number of steps with positive probability; and it is aperiodic, 
since the diagonal elements are positive. Since P is irreducible, all states are of 
the same type, i.e., they are either all transient, all recurrent-null, or all ergodic. 
It follows from Theorem 2 of Chapter 15.6 of Feller [2] that P is ergodic if and 
only if we can construct a row-vector x # O such thatxP = xand )°;| 2; | < «. 

The similarity between this matrix and that obtained by Kendall for the system 
GI/M/s (see [4], p.348) suggests the substitution z, = \" (where n = wp + k(i — 1) 
for the state z, , and for the state 0, n = 0). 


i,j, = % (r=kGt+1—-j7) +u-—», tj #0); 





772 DAVID M. G. WISHART 
therefore the equation xP = x is equivalent to 
2 
= Tn—k+r Nr 
rnO 


and if we make the substitution, this becomes 


(1.10) \* = Fin). 


If p < 1, this equation has exactly k roots inside the unit circle because, if 
p < 1, then F’(1 — 0) = k/p > k; and if 6 > O, then there exists a real number 
r,1—&<r<1,such that >>, mr” <r’. Therefore, on |z| = 1, 


| F(z)| = | LU mre in® ls > mr <r =|e#I, 
n 


and by Rouché’s theorem, the function z* — F(z) has the same number of zeros 
within the circle | z| = r as 2‘. We will show later that it is not necessary for 
these roots to be distinct. Jf these roots are distinct, and are (Ai, --* , Ax), Say, 
we try to express 2», in the form 


(1.11) lm = adT +++ + ade (Sa; = 1 


We note that 2) #4 0 and that >: | s ds |a;|/1 — | A, |, which is finite, 
since |A;| < 1 for all 7. For m = i xP = x is satisfied whatever the a’s if 
rn = FA, Ss is k); and for 1 Ss m < k, we obtain 


oo 


o 
- p> Tn Nk+n—m = = Oy > Ni Nk+-n—m 


nam tol n= 


= >: a “tee Ain — = I in} 


tal a ™ n=O 


> ad? \k—m 7 ie Nn- 


k—m 
tal i=l - 


Put now w; = 1/); , and we have 


k—m—-1 
(1.12) > a; >, wi” "nm = 0 


tel n=O 


The equation associated with the first column of P will be identically satisfied 
because of the row-sum condition, and these k — 1 equations along with 
Dia a; = 1 will serve to determine the a’s. 


Write 
1 msk-1 
Bui = 2 E isk ) 


1 for all 7; 


= {B;;} and 





QUEUEING SYSTEM 773 


then 


(1.13) Ba = h. 


B may be written as the product of two matrices, thus: 


mM ™  *** m2 O 
k—1 k—1 
No M1 *** ks 0 @1 oS a 
m °'* m4 O : 
B= CA = Pe oo. . 
 e. 2 : W1 Wk 
0 nm O 1 ] 
1 


and since both these matrices are nonsingular (for instance, see Ferrar [3], 

Theorem 8, p. 22), B is nonsingular. The chain is therefore ergodic, and 
—-ip-l 

a = ACh. 


Now 
- 1 . 
Ch = iC) {Ciu,--*, Cr} = h, 
where | C | is the determinant of C and C;; the cofactor of c;; in | C |. Hence 
a 1 
a = A‘h = —— {Au}, 
|A| 


using the same notation as before. We have 


| A| 7 (. jfO? 2 II (com a w), 


and 
k—1 k—1 k-1 k—1 
| 1 oS ee ae Cl 
An = (—1)*** . . . 
@ @i-1  Wi4l °° * | Wk 


~ (= oT] =o). 
ii oe 
m,j 94 


Therefore 


ee a 
7 'a IT (- — w; 


(1.14a) “ 
7 I (x —_ -) 
and 
k k+n—1 
(1.14b) In 


. = TT a, —r,) 


jet 





774 DAVID M. G. WISHART 


Also, x = (2;), the probability vector satisfying xP = =, is given by 


k 
n 
> , air; 


faa t=l 
i = k 


Denote the waiting time by w. The probability of not having to wait is given 
by 


Pr(w = 0) = m = (> = )". 


a _— Aj 


. ° . , 
If the system is in state i, ~ 0, then n = uw + k(t — 1), and the queue length 
is Q@ = 7 — 1. Consider the random variable Q’ = Q + 1. Then 
k ok a 
oF aa 


, ° axl jal 
Pr(Q’ = 7 Tutk(i—l) = oe 


uml = a; 


—1— p 


k 


~ a; - s » Nj 


j=l aml 


k 


k 
aj Aj (ke) 
7 


jet l — d; 
k a + 

7 

j=l ‘= A; 


Define 


(1.15) — ce / > aj 


me. f=1 1 — X; 
and note that > 4; = 1 — mw. Then 
‘qi = Pr(Q’ = 2) 


k 


> wore — 


and the cumulative distribution is 


(1.17) Pr(Q’ < N) 





QUEUEING SYSTEM 775 


The generating function for the g; is given by 


00 ~ co 
Q(z) ~~ + > giz" = mf + > v3 7 foc a n**) 
tl j=l tl 
k 0 
(1.18 =m+2z)>,7(1 — M) > (ad) 
j=l t=O 


k k 
Da Lo hy 
ST ea 


Therefore Q, the expected value of Q, is given by 
Q = Q’ — (1 — m) = Q’(1) — (1 — a) 


(1.19) a 


~ fal — 

If a customer arrives to find the system in the state 7, , then his waiting-time 
distribution has Laplace Transform (1 + bp/k)*““"”. The probability of 
finding the system in this state is m,4.(:-1) , 80 the waiting-time distribution for 
an arbitrary customer has Laplace Transform 

k 2 


> a: >> AP(1 + bp/k)™ 


™ + a mn(l + bp/k)” = m+ — = — 


Rix 











(1.20) ! 
iN b ‘ 
yf (1+ @ eb) Uy 
— ee Scene teen aaa on aaeeetiiees 
™ + ; = ~+ 2a 
int l — XA; 
where 
(1.20a G = neti 
aie ‘kd = %) 
Therefore 
k ; 
Pr(iw st) = m+ ze vil — eo") 
inl 
(1.21) " 
=1-Dyre", 
inl 
and 
k 
(1.22) p= > vic, 
inl 


which is in accord with Smith [8]. 
Suppose now that A; is a double root of \* = F(A), and that all the other 








776 DAVID M. G. WISHART 
roots are simple; then A, satisfies also 


we 


ky = F’(uy) = D> naP mn. 


ri-_ 


Substitute therefore 


k—1 k-1 
Lm = ag mdy* + Do ad? (x a; = 1). 
tal tol 


Proceeding as before, we find k linear equations with which to determine the a; : 


k—m—-1 k—m—-1 


k-1 
wias DD, (k—m— nor” ” mn + 2% 


r= tel 


k—m—n 
wy 


mn = 0, 


k—1 
® a= 


tol 


nd 


(1.23) 


The matrix of this set of linear equations may be written as the product of non- 
singular matrices: 


k-1 
@1 


(k — 1)wi 


1 @1 
ie 1 
where C is the matrix defined above. Taking h as before and 


2 
@ = {wiao,a,°** , or}, 


we have CAma = h, and hence a = Awh. The cofactors of the kth row are all 
readily calculated except that of the element (k, 2), which cannot be expressed 
in a closed form. 

The analysis proceeds along the lines used for single roots: 


k—1 

m—1 \™ 

Tmt = amy + a ari, 
teal 


k—1 


a 


(1 — Ay)? 
Lm/d. 


> te = +2 


Tm 


oe fa and 


ee 
jat¥j = 1 — mo. 


a Yo 


then 


= d: 


BT-i, 


ao 
it santo Fe 
(i=— / F 





peer > ae aa 


QUEUEING SYSTEM 777 


With the notation of (1.16), 


i 1 k(i~1) 3 —- rn nn k ik(1 _ | < k(i—1) ki 
Qi i {a AL | da Poh + a + 2 5A; Ni)? 





Mi 
and 
1—-M—KI—») , KI -MA-w], SR. 1-d 
Oe) = mot] y| MEE WD 4 RO OD) Fy 2 
7 " i- a (1 — zat) 2 “ie 
(note in passing that Q(1) = >-“‘xty; + mm = 1), whence 
k(1 — Ai ri = wr 
(1.24) = Ee + —|+ i. 
site a uT-* 


Also, the waiting-time distribution has Laplace Transform 


i 1 1—-” < Yi 
2 = rice Teo 
(1.25) ¢(p) n+l git teh |+ 2, 


and the mean waiting time is given by 
k—1 
(1.26) B= —¢'(0) = (14+ Mat Le, 
tal 


and so on to higher multiplicities. 


2. The system D/E;,/1. Lindley [5] obtained an integral equation for the 
waiting-time distribution in the system GIJ/G/1 and solved it for the system 
D/E;/1; 1.e., he took 


_fi ifu = 1, 
‘= \0 ifu <1, 
and 
k/b = oe. 

Then F(A) = e°"™ and the A; are the solutions of * = ¢ °°. Put 
—a(1 — A) = z, and this becomes 

oe 
2 magni is 
- Gor ° 


which is Lindley’s equation (17). These roots are distinct, for if 2; were a double 
root of (2.1) it would also satisfy the equation 


k k 
ko aie o 


—————_—_ = é => 
(a1 + o)*# (a1 + o)*’ 
, . ° . —(k— ol 
or z, = k — o. Now k is a fixed integer, so we would require e “~” = (c/k)*, 
or k*e~* = o*e~’. But this is satisfied only by the value o = k, since the function 








778 DAVID M. G. WISHART 


ze * attains its maximum value at z k. If o k, then A, 1; and yet by 
definition, \; lies inside the unit circle. 
Lindley obtains for the waiting-time distribution G(¢) Prw st) =1- 
‘j-vvie “*, where the 7; are those of (1.15) and the ¢; are as defined in (1.20a). 
For the y;, he has the linear equations 


k 
(2.2) Bg a) oh @ (r = 0, 


” cl t= (o +2 yr 


»k ae l), 

which become, in terms of the quantities we have been using in Section 1, 
Yi ( 
= (r= 1,-:-,k — 1) 
a » GN = Lie ; 
whence 
k 
== n=l 


along with 


k 


>a: = 1, 
i=l 


to determine the a; . The matrix of the equations is the product 


eae . 3 
iO 1 


we | 
= 2.2 

| » 
} tt] 


ey oe oe 


and therefore they are equivalent (in the particular system considered by 
Lindley) to equation (1.13). 


Acknowledgment. I am grateful to Mr. Kendall for his invaluable assistance 
and encouragement throughout this work, and to the referees for several useful 
suggestions. 

REFERENCES 
{1] E. Brockmeyer, H. L. Haustr¢@M, anv A. Jensen, The Life and Works of A. K. Erlang, 
Copenhagen, 1948. 
[2] W. Fe.uer, An Introduction to Probability Theory and Its Applications I, John Wiley 
& Sons, Inc., New York, 1950. 
{3} W. L. Ferrar, Algebra, Oxford University Press 1941. 


[4] D. G. Kenpa.t, “Stochastic processes in the theory of queues,’’ Ann. Math. Stat., Vol 
2% (1953), pp. 338-354 





QUEUEING SYSTEM 779 


[5] D. V. Linpuey, “‘Theory of queues with a single server,’’ Proc. Cambridge Philos. Soc., 
Vol. 48 (1952), pp. 277-289. 

(6) F. Potuaczex, ‘‘Fonctions caractéristiques de certaines répartitions définies au moyens 
de la notion d’ordre. Application & la théorie des attentes,’”’ C. R. Acad. Sci. 
Paris, Vol. 234 (1952), pp. 2334-2336. 

|7] F. Potvaczex, ‘Sur une généralisation de la théorie des attentes,’’ C. R. Acad. Sci. 
Paris, Vol. 236 (1953), pp. 578-580. 

[8] W. L. Smrru, ‘On the distribution of queueing times,’’ Proc. Cambridge Philos. Soc., 
Vol. 49 (1953), pp. 449-461. 





ERRORS IN NORMAL APPROXIMATIONS TO THE t,7, AND SIMILAR 
TYPES OF DISTRIBUTION':? 


By J. T. Cuv?® 
University of North Carolina and Case Institute of Technology 


1. Summary. For the cdf’s of Student’s (¢) and Thompson’s (7) distributions, 
upper and lower bounds are obtained in terms of the normal cdf. It is then 
shown that, in using the normal approximation for the cdf’s of these distribu- 
tions, the proportional errors are uniformly smaller than 1/n for all n = 8 and 
13, respectively, where n is the number of degrees of freedom. Similar methods 
may be used to derive bounds for edf’s of similar types. Examples are given. 


2. Introduction. Let F,(x), n = 1, 2, --- , be a sequence of cdf’s (cumulative 
distribution functions) such that for every fixed z, F(x) — F(x) asn > @, 
where F(x) is a cdf independent of n. From a practical point of view, it is de- 
sirable to know how large n has to be in order that D,(x) = | F(x) — F(z) | 
be small enough so that F(x) may be used as an approximation to F(z), although 
approximations are often used in practice without much knowledge about the 
magnitudes of the errors. The function D,(x) may, of course, vary considerably 
for different values of n and x. But the most interesting kinds of D,(zx)’s are 
probably those which tend rapidly to 0, uniformly in z. In such cases, F(z) 
provides for all n’s greater than some minimum, and for all z’s, a satisfactory 
approximation for F,(x). Generally, however, even though there is ample 
numerical evidence that as n increases D,(x) rapidly becomes uniformly small, it 
may not be easy to obtain a mathematical proof. 

There are, on the other hand, types of sequences of cdf’s for which we are 
able to confirm rigorously that they do tend rapidly to normality. Suppose that 
a cdf has one of the following forms: 


F,(z) = a | (1 + 2°/n)*””? dz, 


where C,, and m depend only on n, a positive integer. (If the integrand is 1 — z°/n, 
it should be replaced by 0 when z = n.) By simple transformations of the 
variable of integration, upper and lower bounds are found for F(x) in terms of 
(x), the unit-normal cdf specified by (6). These bounds may sometimes be 
further simplified by obtaining bounds on C,. If m/n — 1 as n — o, then 
\/2xC, — 1, and for every fixed x, both bounds for F(x), and consequently 


Received April 6, 1955. 


1 Most of the work was done at Chapel Hill under the sponsorship of the Office of Naval 
Research. 


2 Presented by title, ‘‘Errors in normal approximations to certain types of distribution 
functions,’”’ at the 1955 Western Regional Meeting of the IMS. 
3 Now with the Department of Mathematics, Case Institute of Technology. 


780 





ERRORS IN NORMAL APPROXIMATIONS 781 


F(x) itself, tend to &(x). If m/n and +/2xC, tend rapidly to 1, then F,(z) 
tends rapidly to (x) for all 2. Therefore, in this case, the error in using (x) as 
an approximation to F,(x) rapidly becomes uniformly small as n increases. 
In Section 4 applications are given to sequences of cdf’s corresponding to the 
Student’s ¢-distribution, the r-distribution of W. R. Thompson [10], and the 
distributions of the partial and total correlation coefficients when the variates 
involved are independently and normally distributed. For most of these cdf’s, 
we are able to show that the error bounds in using the normal approximation 
are small, although the actual errors may be even smaller. 

An application to the x’-distribution is given in Section 4.D. Similar methods 
were used by the author [1] to derive upper and lower bounds for the edf of the 
sample median Z in terms of its asymptotic distribution function (which is 
normal). There, we also showed that if the parent distribution is normal, then, 
even for samples of moderate sizes, the error is small in using the normal ap- 
proximation to the cdf of . The edf of Z can be reduced to one of the forms 
given above by several transformations. But some different arguments are also 
needed in order to get the bounds obtained in [1]. 

Another type of bound (also in terms of ®) is derived for the cdf’s of the 
t- and r-distributions (see Equations (24) through (27)) by using the inter- 
relationships between these cdf’s and their bounds obtained by the methods 
described previously. In Section 5 some numerical comparisons are given of the 
two types of bounds and of two kinds of approximations (the normal and 
Hendrick’s [6]) for the edf of the ¢-distribution. 


3. Lemmas. 
Lemma 1. 
(1) l+2z¢<se, for all real x, 
(2) l+z2 S pe. according as x e 0. 
If x = 0, then 
(3) a/l —e*)? 21, 
and 
(4) xe /(l— ey? <1. 
Proor. The function e° — x — 1 has its minimum 0 at z = 0, hence we have 


(1); (2) holds because log (1 + 2) — x + 2’/2 is monotonically increasing for 
all x > —1. Substituting —2* for z in (1), we have (3), and (4) follows from 
the fact that the LHS (left-hand side) tends to 1 as z — 0 and is a monotonically 
decreasing function of x. (Differentiate twice). 

Lemma 2. Let 





_1X38--- (Qn —1) 5 
Rn TT ee een 





782 


where « —1. Then, 


me ife = 4 


Proor. b,(0) is known ({11], p. 351) as the Wallis product and tends to 1/+/ 
asn— «x. Obviously b,(c) tends to the same limit for every fixed c. By examining 
the square of the ratio b,4:(c)/b,(c), it can be shown that b,(c) is a strictly 


increasing function of n if ¢ S 4, and is a strictly decreasing function of n if 
c = and n = 2. Hence, we have (5). 
If we have a chain of inequalities, as in (7) below, of the form A, 


s Az 
A; SA,S As 


, Where the A’s are functions of m, n, x, or other such quantitie 
the particular inequality A; < A; (i < 7) will be denoted by (7. 77). 
Lemna 3. Let 


(6) (x) = | (Qe) *e-?? dt, 


and &o(x) = ®(x) — 3. Then, (7.12) and (7.23) hold for all m,n > 0, and 0 < 
xs Vn; (7.34) holds for all m,n > 0, and0 < x < ~; and (7.45) for all m > 3, 
n>O,and0O S22 < @~. 


Vn/im + 2)&(2~/(m + 2)/n) < (24) (1 — 2/n)”” dz 


=z 


< V/n/me(rv/m/n) Ss (24) *” (1 + 2°/n)”” dz 
Jo 


< Vn/(m — 3)&(x/(m — 3)/n ) 


Proor. It is easy to see that (7.23) and (7.34) are immediate consequences of 
(1). Now use the transformation 


v(z) = [n log (1 + 2°/n)}'”, 
so that 


(x) 


exp [— (m — 3)v"/2n]h(v/-/n) dv, 


where h(x) is the LHS of (4). By (2) and (4), h(v/+/n) S 1 and v(x) S x. Hence 
we have (7.45). Finally, 


7.12) can be obtained in a similar way by using (1) 
and (3) after applying to the integral of (7.12) the transformation 


u(z) = [—n log (1 — 2*/n)}'”. 


LemMa 4. Suppose no ts a fixed integer, and for every integer n = 1%, 





ERRORS IN NORMAL APPROXIMATIONS 783 
ex 
> 7 Y 2, zm/2 
(8) F,(x) = C. | (1 + 2 /n)*”"” dz 
a 


is a cdf, where C,, and m depend only on n and lim,.. m/n = 1. (If the integrand 
is (1 — 2°/n)™”, it should be replaced by 0 whenever | z| = +/n.) Then, for every 


fixed x, 
(9) limn+o Fa(x) = (2), 
where (x) is defined by (6). 
Proor. By Lemma 3, we have lim,..C, = 1/+/2x. Using the same lemma 


once again, we obtain (9). 


4. Normal approximations. We showed in Lemma 4 that if a cdf is of one of 
the types (8), then it tends to # as n — . In this section we shall use Lemmas 
2 and 3 to prove that for several well-known sequences of cdf’s of these types, 
the “speed” of approaching the limiting cdf is “uniformly rapid.’ Therefore, 
in using (2) as an approximation to these edf’s, the error is small for all values 
of x, if n is greater than a certain minimum. 

A. t-distribution. The edf of the t-distribution with n d.f. (degrees of freedom), 
n= 1,2, --- , is given by 


(10) F,(2) = [ an(1 + 2°/n)~**?” dz, 
where 
(11) a, = (nx) "Pr (* ¢) /T(n/2). 


It is well known that as n — , F,(x) — (x) for every fixed z, and that the 
“speed” of approaching the limit is rather fast. In fact, the normal approxima- 
tion is often used in practice when n 2 30. We shall derive for F,,(z) upper and 
lower bounds in terms of #(x), then show that the proportional error in using 
(x) as an approximation to F,,(z) is less than 1/n for all z and all n 2 8. 
Applying (7.45) to F,(y) — 3} and 3 — F,(—z) and using the fact that 
(—zx) = 1 — &(z), it can be shown easily that for arbitrary z,y = 0, andn = 3, 
F,(y) — F,(—2) 
< anV 2en/(n — 2)[(yr/(n — 2)/n) — &(—2/(n — 2)/n)). 
From (7.34) we obtain, in a similar way, 
F,(y) — F.(—z) 
= anV/2an/(n + NYioyV(r + 1)/n) — &(-2V/(n + 1)/n)). 


Using I'(a + 1) = «I'(x) and r(3) = Vx, it can be seen that for anyc = —1, 
Gom = bale)>/m/2m + 2c) and domar = Alm + c)/(2m + 1)/(xb,,(c)), where 


(12) 


(13) 





784 J. T. CHU 


m = 1, 2, --- . Letting c be } and # in turn, we obtain, by (5), s/2na, < 
JIT On + 1) if n = 2mand < V1 — 4nifn = 2m+1,m=1,2 


In general, for n = 3, 
(14) V29an < Sil — ¥n. 
Likewise, letting c = 4 and }, respectively, we obtain »/2za, > ~/n/(n + 1) 


ifn = 2m and > V/1 — dn ifn = 2m + 1, m = 1, 2, --- . In general, for 
nel, 


(15) V/2na, > Vn/(n + 1). 


(Direct comparison shows that (15) holds for n = 1.) 
From (12) through (15), we have, for arbitrary z, y = 0, and n 2 3, 


F,(y) — Fr(—2z) 
< V(in — 3)/(in — 19) (Oy (nm — 2)/n) — &(—2V/(n — 2)/n)), 
F,(y) — F,(—2) 
> (n/(n + 1)) y(n + 1)/n) — &(—2V/(n F 1)/n)). 


The proportional error in using A as an approximation to B is defined to be 


(16) 


(17) 


(18) E =| (B/A) — 1}. 


Now, omitting ~/1 — 2/n < 1 and ~/i + I/n > 1 in the arguments in the 
@’s of (16) and (17), we see that EF is not more than the maximum of 
—/(7n — 3)/(7n — 14) — 1 and 1 — n/(n + 1). For simplicity, we may state 
that E < 1/n for alln 2 8. The pea values of E are often much smaller than 
1/n. For example, if n = 30, and y = x = 2.042, then F,,(y) — F,(—z) = 0.95 
while $(y) — ®(—2z) = 0.9588, so FE = = 0.0092. Nevertheless, the bound 1/n is 
independent of x and y, and small enough to justify, in a rigorous manner, the 
use of the normal approximation, provided that n is not too small. More nu- 
merical comparisons are given in Section 5. 

B. Thompson’s +-distribution. The cdf of the r-distribution with n df. is 
given ([2], p. 241) by 


(19) G,(z) = a a,(1 — 2*/n)”” dz, 


where | zx | = Vn, a, = (nx)? T(n/2)/T((n — 1)/2), andn = 2,3, --- . For 
applications of the r-distribution, the readers are referred to ((2], p. 390) and 


[10]. Obviously by (11), Gy = Qn1V/1 — 1/n. Using (7), then (14) and (15), 
we obtain for z, y = 0, and n = 4, 








Pet 


ERRORS IN NORMAL APPROXIMATIONS 785 

G,(y) — G.(-2) 
(20) S anivV/2x(n — 1)/(n — 3)[®(yV/(n — 3)/n) — &(—2V/(n — 3)/n)] 
V (in — 10)/(7n — 21)[®(yV/(n — 3)/n) — &(—2V/(n — 3)/n)], 


G,(y) 77 G,(—2z) 


IA 


IV 


Oni V/ 2a[®(yV/ (n — 1)/n) — &(—2V/(n — 1)/n)| 
V (n — 1)/nl@(yV/(n — 1)/n) — &(—2-/(n — 1)/n) 


(1 — 1/n)[®(y) — &(—2)). 


to 
IV 


IV 


0 S a S 1. Thus, in using ®(y) — 6(—2) as an approximation to G,(y) — 
G,(—2x), the proportional error Z, as defined by (18), is not more than the 
maximum of +/(7n — 10)/(7n — 21) — 1 and 1 — (1 — 1/n). For n = 13, 
this maximum is 1/n. 

The ¢- and r-distributions are closely related. If x has a ¢-distribution with 
ndf., then y = 2V/(n + 1)/(n + 2*) has a 1r-distribution with n + 1 df. 
Conversely, if y has a 7-distribution with n df., then z = y»~/(n — 1)/(n — 7) 
has a f-distribution with n — 1 d.f. Thus, 


(22) F(z) = Gaalav (n + 1)/(n + 2*)), 
(23) G,(z) = FralavV/(n — 1)/(n — 22)), 


The inequality (21.34) is obtained by using the fact that (ax) = a(z) if 


where F,(x) and G,(z) are defined by (10) and (19). New upper and lower 
bounds for F,,(y) — F,(—«) and G,(y) — G,(—2) can be obtained. For example, 
by (22), (20.12), and (21.12), we have 


Fi(y) — Fa(—2z) S anv 2en/(n — 2) 
X (e(yV(n — 2)/(n + Y)) — &(—2-V/(n — 2)/(n + 2))), 
F,(y) — F.(—z) 
anV/ 2alO(yr/n/(n + )) — &(—a-V/n/(n + 2?))). 


(24) 


IV 


Similarly, by (23), (12), and (13), we have 
Galy) — Ga(—2) S ara 2x(n — 1)/(n = 3) 
xX [oy (n — 3)/(n — Y)) — &(-2V/(n — 3)/(n = 2))), 
G,(y) — Ga(—2z) = anavV/2x(n — 1)/n 


X (Sty n/(n — y*)) — O(—2V n/(n — 2*))]. 


(26) 


(27) 


i 





786 . T. CHU 


Obviously, the upper bound (24) is better than that in (12). But neither of the 
lower bounds in (25) or (13) is better than the other. (See the tables given in 
Section 5.) The same may be said about the upper and lower bounds for F,(y) — 
F,,(—2), given by (16) and (17), and those obtained by (22), (20.13), and (21.13). 
Further, the upper bound for G,(y) — G,(—-2), given by (20.12) is better than 
the one in (26). But neither of the lower bounds given by (21.12) or (27) is 
better than the other. For example, the RHS of (27) is at most B(n) 
GQn1V2ax(n — 1)/n, whereas the RHS of (21.12) is close to A(n) 
2a, 2xPo(4/n — 1) if y = x is close to ~/n. A(5) > B(S5). So in this case, 
the RHS of (21.12) > RHS of (27). On the other hand, if y = xz = 1, then the 
RHS of (27) is D(n) = 2an1/2a(n — 1)/no(+/n/(n — 1)) and the RHS of 
(21.12) is C(n) = 2aniv/2rOo(r/(n — 1)/n). C(5) < D(5). Therefore, in this 
case, the RHS of (21.12) < RHS of (27). 

C. Correlation Coefficients. Let a sample of size n + 1 be drawn from a k- 
variate (2 = k) normal distribution with variates 2; , x2, --- , a. Let res 
be the sample partial correlation coefficient between zx; and 2 after elimination 
of the remaining variates x; , --- , x . If, actually, 2: , --- , 2; are independently 
distributed, then the pdf of riz.s.... is (see [2], p 412) V/man,(1 — 2)°?” 


’ 


where | z| S 1, andm = n — k + 2. If k = 2, then the corresponding pdf is 
the pdf of the total correlation coefficient ry». The variance of ry.3...4 is 1/n, . 
The edf of NiTi2-2-.-k 18 Ga,(x), where G,(x) is given by (19). Therefore, the 
proportional error in using ®(y) — #(—<2) to approximate G,,(y) — G,,(—2) is 
not more than 1/n, . Hotelling ({7], p. 196) stated: ‘“This [the normal approxi- 


mation] is in ordinary cases the most convenient method of all [methods for 
evaluating G,,(z)], but no suitable bound for the error is available at present.’’ 
The bound we obtain here seems acceptable, at least when n is large compared 
with k. 

D. x’-distribution. It is well known ({2], p. 251) that if z has a x*-distribution 
with n d.f., then both 2, = (x — n)/+/2n and x, = +/2xr — +V/2n are asymp- 
totically normally distributed with mean 0 and variance 1. According to R. A. 
Fisher ({5], p. 81), the distribution of z; = ~/2z — +/2n — 1 tends to normality 
even “faster.”’ Let F(x) be the edf of any of the z,’s. We tried unsuccessfully 
to derive both upper and lower bounds for F,(y) — F,(—2z), similar to those 
given in (16), (17), (20), and (21). It is not difficult, however, to obtain just a 
lower bound, in terms of 4, for F,(y) — F,(0), where y = 0; and an uppel 
bound, also in terms of #9 , for F,(0) — F,(—z), where x 2 0. The results are 
simple, but not sufficient to provide complete mathematical justification for 
using the normal approximation for the distributions of the z,’s. 

In the following we shall show briefly how to derive a lower bound and an 
upper bound, respectively, for H,(y) — H,(0) and H,(0) — H,(—z), where 
H,(z) is the edf of ~/22 — +~/2m,m = n — 1, y, x = 0, and z has a x’-distribu- 
tion with n d.f. Exactly the same technique may be used to derive the corre- 
sponding results for the z,’s. But they are less neat, and therefore will be omitted 

If y = 0, then 





Peas” 


eer 


ERRORS IN NORMAL APPROXIMATIONS 787 


“Um 
H,(y) — H,(0) = 2-"*¢-(n/2) | grils? gy 
(28) “mm 


r “(n/2)(m/2e)”* {{1 + 2//2m] exp |—z/+~/2m — (1/2)2*/2m]}" dz, 
“0 


where yn = (y + W2m)’/2,2 = ~VW2x — V2m, and P(x) = 1/T(z). Using 
(2) and T(n + 1) < V2xn"*"” exp [—n + pen] (see [11], p. 352), it can be 
shown that I" '(n/2)(m/2e)"” = 1/+/2x for n = 4. Now, applying (2) to the 
first factor of the integrand in the second integral of (28), we have, for all y => 0 
and n = 4. 

H,(y) — H,(0) = ®o(y). 
Similarly, if 0 < x S ~/2m, and n = 4, then 


H,(0) — H,(—x) S ®o(zx). 


5. Numerical comparisons. We shall now give some numerical comparisons 

of two known approximations for the cdf F,(x) of the ¢-distribution and the 
upper and lower bounds for F(x) given by (12), (13), (24), and (25). One of 
the approximations is (x), and the other, suggested by W. A. Hendricks ((6], 
p. 216), is &(z,), where 
(29) t, = t(anV/24)V/2n/(2n + 27). 
In the tables given below, we choose n to be 10, 30, 60, and 120. For each n> 
values of x are obtained for which T = F,(x) — F,(—x) = 0.50, 0.75, 0.90, 
0.95, and 0.99, respectively. For each pair of z and n, we compute A = @(x) — 
@(—x) and A; = ®(z,) — &(—z,); U and L, the RHS of (12) and (13); and 
U, and L, . the RHS of (24) and (25). We then tabulate the differences between 
the values of T and the bounds and approximations corresponding to the same 
n and x. For example, Ds = A — T, Do, = Ui — T, ete. We use [13] and [3] 
to find the values of the and T functions. 

Various approximations for 7 have been suggested. (See, for example, [4], 
(6), [9], and [12].) Only A and A, are tabulated here. It seems that A; is a better 
approximation than A, particularly if the d.f. is small. We also point out that 
’(y,) — &(—z,) > RHS of (25), where y, is the RHS of (29) with z replaced 
by y, because 0 < a,+/2x < 1, ao(x) S So(ar), if 0 S a S 1, and for every 
fixed x, n/(n + 2°) is an increasing function of n. Computations indicate that 
&(y,) — &(—z,) < RHS of (24), but we are not able to prove it mathematically. 
(Using the fact that &o(ax) S a(x) if 1 S a, it can be shown that the RHS of 
(24) > (y',) — &(—z’,), where 2, is the RHS of (29) with 2n replaced by n 
and 1, , the same corresponding to y.) 

The tables also indicate, among other things, that L is always closer to T 
than U, (the actual values of U, computed from (12), are greater than 1 when 
T = 0.95 and n = 10, and T = 0.99 and n = 10, 30, and 60,) and that all 
the bounds and approximations tend monotonically to T. Again, we are not 
able to prove or disprove these findings. 





J. T. CHU 


TABLES 
































; 7 ends 
Du | J Y | Da, 





init , —_— 
Rak huse Oe ts 8 | —.061 , — .002 
30 | 2.010 | — .019 .000 
— . ao —.010 | 009 | .000 
oe ae en | —.005 .000 





6. Some problems. The referee of this paper mentioned several sequences of 
distributions, related to the ¢-distribution and noncentral ¢-distribution and 
known, through numerical investigations, to approach rapidly to normality. He 
asked whether methods of this paper could be used to derive suitable bounds 
for the errors in using the normal approximation for these distributions. One of 
them [8] is the distribution of ¢ + ks, where Z and s are the sample mean and 








ERRORS IN NORMAL APPROXIMATIONS 789 


standard deviation of a sample drawn from a normal distribution, and the 
others, suggested by J. W. Tukey and ascribed to him by C. P. Winsor in [12], 
are the distributions of (n + 2)t/(n + 2 + #) and (7n/5 + 1)t/((7n/5) + 1+ 20), 
where ¢ has a ¢-distribution with n d.f. 

The author thinks the question is a very interesting one and hopes some 
answer will be found in further research. At the present he wishes to mention 
that the methods used in this paper do have some other applications. For 
example, by using the transformations u(z) and v(z), given in the proof of Lemma 
3, and similar ones, bounds can be obtained, in terms of the edf of the x*-distribu- 
tion with m d.f., for the edf of the F-distribution with m and n d.f. and for the 
distribution of nz, where x has a B-distribution with m and n d.f. The corre- 
sponding upper and lower bounds are close to each other if n is large compared 
with m. 


7. Acknowledgment. The author wishes to thank William Kruskal for his 
comments and suggestions. Thanks are also due to Prof. H. Hotelling for his 
critical reading of the original manuscript and to Lorrie D. Sylvester for some 
help on computing the tables. 


REFERENCES 

(1) J.T. Cuv, ‘On the distribution of the sample median,’’ Ann. Math. Stat., Vol. 26 (1955), 
pp. 112-116. 

2] H. Cramtr, Mathematical Methods of Statistics, Princeton University Press, 1946. 

(3] H. T. Davis, Tables of the Higher Mathematical Functions, Vol. 1, The Principia Press 
Inc., Bloomington, Indiana, 1933. 

[4] W. E. Demine anp R. T. Brraz, ‘On the Statistical theory of errors,’’ Rev. Modern 
Physics, Vol. 6 (1934), pp. 119-161. 

[5] R. A. Fisner, Statistical Methods for Research Workers, 10th ed., Oliver and Boyd, 
Edinburgh, 1948. 

[6] W. A. Henpricks, ‘‘An approximation to Student’s distribution,” Ann. Math. Siat., 
Vol. 7 (1936), pp. 210-221. 

[7] H. Hore.uine, ‘New light on the correlation coefficient and its transforms,” J. Roy 
Stat. Soc., Series B, Vol. 15 (1953), pp. 193-232. 

[8] W. J. Jennerr anv B. L. We cu, ‘‘The control of proportion defective as judged by a 
single quality characteristic varying on a continuous scale,’”’ J. Roy. Stat. Soc., 
Suppl., Vol. 6 (1939), pp. 80-88. 

{9] ““SrupgEnt,”’ ‘“‘The probable error of a mean,’’ Biometrika, Vol. 6 (1908), pp. 1-25. 

{10} W. R. Tuompson, “On a criterion for the rejection of observations and the distribution 
of the ratio of deviation to sample standard deviation,’’ Ann. Math. Stat., Vol. 6 
(1935), pp. 214-219. 

[11] J. V. Uspensxy, /ntroduction to Mathematical Probability, McGraw-Hill, New York, 
1937. 

[12] C. P. Winsor, “‘Biometry,’’ Medical Physics, Vol. 1, edited by O. Glasser, The Year 
Book Publishers, Inc., Chicago, pp. 89-110. 

[13] Tables of Normal Probability Functions, National Bureau of Standards, Washington, 
D. C., 1953. 





ON THE STOCHASTIC INDEPENDENCE OF TWO SECOND-DEGREE 
POLYNOMIAL STATISTICS IN NORMALLY 
DISTRIBUTED VARIATES 


By R. G. Lana 


Indian Statistical Institute 

A remarkable property of the normal law as proved by Craig [1] is that if 
41, %2,°**, 2, are n identically and independently distributed normal variates 
each with zero mean and unit variance, then the necessary and sufficient con- 
dition for the stochastic independence of two real homogeneous quadratic statis- 
tics Q, = rAz’ and Q, xB’ is that the matrix product AB 0. The same 
theorem has also been proved independently by Hotelling [2], Sakamoto [5], 
Matusita [3], and Ogawa [4]. 

In the present paper we shall establish a corresponding theorem for the case 
of two second-degree polynomial statistics in normally distributed variates, and 
give some related results. 

THEOREM 1. Let a, X%, - , t, be n independently and identically distributed 
normal variates each with zero mean and unit variance; then the necessary and 
sufficient condition that two real polynomial statistics of the second degree denoted 
by Py = xAz’ + Iz’ and P, rBx’ + mz’ are stochastically independent is that 


Gi) AB = 0, (ii) IB 0, (ili) mA = O, (iv) lm’ 0. 


Here, x, 1, and m, respectively, represent the row-vectors (x; , 22, Je, 


(i, le,-+:,l,), and (m,, m2, +--+, m,) and 2’, I’, and m’, as usual, represent 
their corresponding transposes and A = (a;;) and B = (b;;) are both real sym- 
metric matrices of order n. 

ProoF OF SUFFICIENCY. Without any loss of generality we can write 4, and t. 
in place of it; and it, , respectively, so that the characteristic function of the 
joint distribution of P; and Ps is given by 


(1) o(t; , te) = Hlexp (4P: + &P2)). 
Hence, 
o(3t, , 34) = Elexp (4P: + 3t2P2)] 
=|I—4A — 4B\” 
x exp {3(4l + tem) — hA — &B) "(ql + tem)’}. 
Now, putting & = 0 and 4, = 0 alternatively in (2), we get 


(3a) o(4t,, 0) = | I — tA |" exp (Ati — 4,A)~U, 


(3b) (0, 3t-) I — t.B ‘? exp (Atsm(I — toB) ‘m’|, 


Received May 27, 1955 





STOCHASTIC INDEPENDENCE 791 


where $(4 , 0) and ¢(0, &) represent the characteristic functions of the marginal 
distributions of P; and P» , respectively. 
When AB = 0, we get, after a little simplification, 


(4a) I-—tA\||I — tB I —tA — BI, 
(4b) (1 — 4A)" + UI — 6B)" — 1 = (1-—tA — tB)", 
and 


(41 + bm)U — hA — bBY "(ql + tm) — GU — 4A) 'V — &m(1 — bB)'m’ 


titdBU — tB) "UV + ttimU — tA) Am’ 


(4e) ks S 
+ 2titel( — tA) ‘Am’ + 24t1BU — tB)'m’ 
+ 2ttolm’. 

Thus, when 1B = 0, mA = 0, lm’ = 0, in addition to the condition AB 0, 


the expression on the right-hand side of (4c) vanishes, yielding the relation 


(tl + tm) — tA — tB)"(tl + tm) 


(>) ° 2 
Gl — hA)'V + &mU — By’. 


Then using (4a) and (5) together in (2), (3a) and (3b), we get $(t, , t) 
o(t; , 0)¢(0, &), establishing the stochastic independence of P; and P» . 

PROOF OF NECEssITY. Here it is given that the relation $(f; , t2) = o(t; ,0)6(0, t) 
holds identically fer all real ¢; and t , so that from (2), (3a), and (3b), we have 
the relation 


exp{3[(il + gm)(I — tA — &B) (tl + tm)’ 


(6) : ' , i= nAhs~ ob 
~fir = gay"? ~ Gell ~ eb Wh @ EE 8 
ae ite a TE om eek or 


Thus, from (6) we see that the relation 


exp [P(tt, : ite) /Q(it, 9 ity)| = Rit , ite) / S(it, ’ its) 


holds identically for all real 4; and & , where P, Q, R, and S are polynomials in 
trandt. 

But it can be easily proved that in such a case the rational functions P/Q and 
R/S are constants. Hence, (6) gives two conditions: 
(7a) I-—tA\|Il — &B| = Cy-|1 — tA — Bi, 
. (hl + tem) — tA — tB) "(ql + tem)’ 
(4D) 


— G1 — 4A)’ — tim( — &B)'m’ = C. 


to be satisfied for all ¢; and tf. , where C; and C. are constants. 
But, putting 4 to 0 in (7a) and (7b), it follows that C, land (C, (0), 








792 R. G. LAHA 


so that we have 

(8a) I-—t4A}|\|\l—é&£B I -—tA — 4B 

Bae (tl + tm) — A — tB) "(tl + tem)’ 

- — til — tA) 'l — tem( t.B) mm’ 0. 


It has been already proved in [1], [2], [3], [4], and [5] that if (8a) holds identically 
for all ¢; and t , then AB 0. 
Now, by virtue of the condition AB = 0, the left-hand side of (8b) simplifies 
to the expression (4c). Hence, we have 
ttIBU — tB) lV + titam(I — tA) Am’ 
+ 2titl(E — tA) Am’ + 24tBU — &B)7'm’ 
+ 2ttelm’ 0, 


holding identically for all 4; and &. Then restricting the values of 4, and t to 
the neighbourhoods of the origin | 4, | < 1/a and |t| < 1/8, where a and 8 
denote the largest of the absolute values of the latent roots of the matrices A 
and B, respectively, we have the power series expansion 


(9a) (I —tAy' =1+4A + GA'?+---, 
(9b) (I — 4B)" = 1+464B + 6B + 


Substituting the expressions on the right-hand sides of (9a) and (9b) in (8c), 
above, and collecting the coefficients of t,t. and tit , we get 


(10a) lm’ = 0, 
(10b) IBBI’ + mAAm’ = 0. 


The elements of A, B, /, and m being all real, (10b) at once gives IB = 0 and 
mA = 0. Again, when /B 0,mA = 0, and Im’ 0, (8c), above, is satisfied 
for all ¢; and ¢, , which completes the proof. 


The extension to the correlated normal variates is also simple. Let 


x (a , 22, °** , tn) be n-variate normal with mean vector zero and the vari- 
ance-covariance matrix ~. Then the necessary and sufficient condition that 


P, = xAx’ + Ix’ and P, = xBx' + mz’ are stochastically independent is that 
(i) AZB = 0, (ii) ZB 0, (iii) mDA 0, (iv) lm’ 0. 


The proof follows by using a real nonsingular linear transformation y xT 
such that TT’ =. Hence, using the above transformation, we have 
P, yAoy’ + ly’ and P, yBoy’ + moy’, where A Thar’, Be 7m’. 
l loT’, and m mol”, and further y: , ye, *** , Yn are independently normally 
distributed, each with zero mean and unit variance. Then using the above 
theorem, we get the set of necessary and sufficient conditions as 


(11 (1) ApBo 0, (11) IpBo 0, (ili) meAo 0. and (iv) loms 0. 





STOCHASTIC INDEPENDENCE 793 


Next, rewriting the conditions in (11) in terms of A, B, l, and m, and using the 
morn —— 
relation T7’ = ZY, we get 


(i) ADB 0, (ii) ZB 0, (111) MEA = O, (iv) lim’ = 0. 
Coro.uuary I. (Extension to the noncentral case). Let 2, 22 ,---, tn be n 
independent normal variates distributed with means yy , we, -** , Un, but having the 
same variance, say unity. Then the necessary and sufficient condition for the stochastic 
independence of two second degree polynomial statistics Py = xAx’ + lz’ and 
P, xzB2’ + mz’ is that 
(i) AB = 0, (ii) (B = 0, (iii) mA 0, (iv) lm’ 0. 
Proor. Let us take y x — wu, where 
j (Yi, Yeo, » Ya); x [Se . Ze. » Za), \fts » Mas °** 5 Mn) 


Then we have 
P, rAx’ + lx’ yAy’ + (QnA + l)y’ + pAp’ + ly’, 
P; rB2z' + mz’ yBy’ + (2uB + m)y’ + wBu’ + muy’. 


Now, 4, Ye, °** » Yn are also distributed independently normally, each having 
zero mean and unit variance, and the proof follows from the above theorem, 
simply replacing 1 by 24uA + 1 and m by 2uB + m. The corresponding extension 
to the correlated case when the variates are distributed with an artibrary mean 
vector (ui. we. , mn) and variance covariance matrix = also follows im 
mediately 

tributed normal variates each having zero mean and unit variance. If two real poly- 
nomial statistics of the second degree denoted by P; = xAz’ + Ix’ and P, = xBzx’ 4 


Corotiary II) Let 2, a,°::, tn be n ide ntically and independently dis- 


mx’ are stochastically independent, then there always exists an orthogonal trans- 
formation given by y xP, reducing simultaneously both P,(x, , X2, +++ , %n) to 
Pilys , Ye, *** 5 Ye) and Po(a, , 22, -+* , Xn) to Pily, 115 Yka2,°** 5 Yn) such that 
P{, and P} do not contain any common variate. 

In this connection it is interesting to note that a more general and difficult 
problem has been suggested by Prof. Yu. V. Linnik during his recent seminar at 
the Indian Statistical Institute in Calcutta. It is his conjecture that if two 
polynomial statistics P;(a , x2, +--+ , tn) and P2(a,, x2, --+ , Xn) in identically 
and independently distributed normal variates 2; , x2 , --- , 2, are stochastically 
independent, then there exists an orthogonal transformation y = zP, re- 
ducing simultaneously both P;(x,, %2, +++, 2%n) to Pi(m , Yo,***, Ye) and 
Play, te, -** , tn) to Po(ynar, Year, *** 5 Yn) Such that the new polynomials 
P; and P; do not contain any common variate; that is, we can “unlink” the poly 


1 While this paper was in press, the author has learned in a communication from Prot. 
Yu. V. Linnik that the results contained in Corollary II are also obtained independent}; 
by Prof. A. A. Zinger of the University of Leningrad. But his method of proof is not known 
to the author 





794 R. @. LAHA 


nomials in such a case. Here we give a partial solution to this problem when both 
the polynomials are of the second degree. We note further that the above corol 
lary generalizes a corresponding result due to Hotelling [2] on the stochastic 
independence of two real homogeneous quadratic statistics. 

Proor. From the above theorem, it follows that when P; and P:. are stochasti 
cally independent, we have 


AB 0, IB 5 mA = 0, and Im’ 0. 


Let a, , a@a,--:, a, and B, B,---, B, denote the non-zero latent roots of 
the matrices A and B, respectively, such that r + s S n. Now there exists an 
orthogonal matrix C such that CAC’ = D and CBC’ = E, where 


n—r? n—f?T 


_ (De 7 r= (2 “| 
D = ‘ 0)...’ E = EF)...’ 


D., being the diagonal matrix consisting of the non-zero latent roots a; , a , 
- , a, given by 


Or 
The matrix C being orthogonal, the relation (8a) reduces to 
(12) I-tD\\|\I-—tE|=\|1-—tD—-tE 
Next, equating the coefficients of t} on both sides of (12), we get 


(13) I — t.K | - I -” toF 9 


holding for all t, where F is the symmetric matrix of order n r as given 
above. Now, from (13), it can be shown easily that the non-zero latent roots of 
the matrix F are also given by 6, , 6: , --- , 8, . Hence, there exists an orthogonal] 
matrix Cy of order n — r such that 


8 n r—e 


Ce FC; - iz 6 
0 0 n—r—s , 


where £ is the diagonal matrix of order s formed by the non-zero latent roots 
8: , Be, «+: , & given by 































STOCHASTIC INDEPENDENCE 


Let us now put 


n 


' . ee 
es fi a 


and write C (,C. Then it follows that C. is an orthogonal matrix and further 
that 


n—r 


; v Dp © 
Ab * & 0). 
and 


E, E, ( v. 


‘ 


Ce E. ? 0 ) 
7 Tica data 

Let us denote C.BC: by G, then it follows that the latent roots of G are the 
same as those of B. But since the matrix G is symmetric, the sum of the squares 
of its elements is equal to the sum of the squares of its non-zero latent roots 
and hence equal to » et 1 85 . Thus it follows that E; = 0 and E. = 0. Hence, 
we have proved the existence of an orthogonal matrix C,, reducing simul- 
taneously both the matrices A and B to their canonical forms such that 


‘ a 
dD. 0 0 
C,AC: = Dp =| 0 0 O 
0 0 
and 
0 0 0\’ 
C: BC: = Ey = {0 Es 0 | 
0 0 0/, 
where D, and Eg are defined above. 
Let us now suppose that this orthogonal transformation, defined by x = yC: , 
reduces the vectors / and m to \ and uy, respectively, such that \ ICS and 


yu = mC;.Then the conditions 1B = 0,mA = 0, and Im’ = 0 give us AE 0, 
uD, = 0, and Ay’ = 0, respectively. Thus, from the form of Dy and Fp , it follows 
that the vectors \ and uw should be of the forms 


r Cio Be o.* ** ote sO, Gy.° °° 9 Oe raeed, °° * gaa) 


gp = (0; O, + >:D s pen 4 abs? 1% pte Megent 9 °° °° 9 Mads 


where the elements satisfy the relation 


n 


2 Aju; = 0. 


=r+s+l1 





796 R. G. LAHA 


At first we note that if r + s n, the above orthogonal transformation de 


fined by x = yC, reduces simultaneously both 


P, = 2Az’ + Iz’ to + 3 Nj Y 


P, xBa' + mex to 2 Bs Ur+i Zz. My Y 
ton’ 


such that the new polynomials P} and P have no common variate, which com 
pletes the proof. 
But ifr + s < n, we take 
i @.@ 
at ae 
e © Cau 


where C, is an orthogonal matrix of order n — r s such that its first two row 
vectors are given by 


(Ap+041 Ao, * on y) and 


where 


Ni > and se 
Thus, if we define the above transformation by 2 zP, where P CC. is an 
orthogonal matrix, it follows easily that this transformation reduces simul 
taneously both 


P, = zAz’ + lr’ 


P, = xBz' + mz’ to 


, / ° . 
such that P; and P, do not have any common variate. Hence the proof. 
The proof of the above corollary for the non-central normal case 
immediate. 
REFERENCES 

1] A. T. Crata, ‘Note on the independence of certain quadratic forms,’’ Ann. Math. Stat 
Vol. 14 (1943), pp. 195-197. 

(2} H. Hore.uina, ‘Note on a matric theorem of A. T. Craig,’’ Ann. Math. Stat., Vol. 15 
1944), pp. 427-429 

[3] K. Marusira, ‘‘Note on the independence of certain statisties,’? Ann. Ins. Stat. Math., 
Tokyo, Vol. 1 (1949), pp. 79-82. 

[4] J. Ogawa, ‘‘On the independence of bilinear and quadratic forms of a random sample 
from a normal population,’’ Ann. Ins. Stat. Math., Tokyo, Vol. 1 (1949), pp. 
83-108 

5] H. Saxamoro, “On the independence of two statistics,’’ Res. Mem. Inst. Stat. Math., 
Tokyo, Vol. 1 (1944), pp. 1-25 































THE WAGR SEQUENTIAL (-TEST REACHES A DECISION 
WITH PROBABILITY ONE! 


By Hersert T. Davin? anp WituiAm H. KrusKau 
University of Chicago 

0. Summary. The WAGR test? is a sequential procedure for testing the null 
hypothesis that the proportion of a normal population greater than a given 
constant is po (given) against the alternative that it is p, (given). These are 
equivalent (after a translation) to hypotheses specifying the value of u/o, where 
» and o are the mean and the variance of the normal population under test. We 
prove that, with probability one, a decision is reached when the WAGR test 
is applied. This fact is of importance in its own right; it also has indirect interest 
because, unless it were true, the standard Wald inequalities on probabilities of 
error at the two hypothesis points could not be applied. 

1. Introduction. The WAGR sequential test for one-sided proportion defective 
may formally be described as follows: Let X,, X2,--- be a sequence of in- 


dependent identically distributed normal random variables with mean uw and 
variance o. Let U be a given number. Let 


n ,fn \? 
um = WU - x) /(% (U - x") 


Let a and 8 be given numbers, between 0 and 1, such that a + B < 1. Define 
for reference the inequality 


(1.1) In ce < 1,(u,n) S In cde 
—a a 
where 
/ ‘ 2 2 72 Hh,-1(—, Ki) 
9) MB “a a 
(1.2 L,.(u,) (n un) (Ko » +I1n Hh,..( au, Ro)’ 
(1.3) Hh,(x) = | — exp [—3(z + 2)’] dz, 


Received May 31, 1955. 

1 This work was sponsored by the Army, Navy, and Air Force through the Joint Services 
Advisory Committee for Research Groups in Applied Mathematics and Statistics by Con- 
tract No. N6ori-02035 (NR 342 043). 

2? Now at Iowa State College, Ames, Iowa 

* The name WAGR stems from the initials of those individuals who suggested and de- 
veloped the test in question: A. Wald, K. Arnold, Goldberg ([4] p. 83n), and Rushton [8]. 
The name of G. A. Barnard [1] should be added to this list, and in fact a particular case of 
the WAGR test has been called Barnard’s test in at least one exposition [3]. The test may 
be called a (one-sided) sequential t-test, although perhaps this name should be reserved 
for the special case in which po = 3 (i.e., 4 = 0 under the null hypothesis). 
797 





YS HERBERT T. DAVID AND WILLIAM H. KRUSKAL 


and K, < Ko are given. K; (¢ = 0, 1) is the unit-normal deviate exceeded with 
given probability p; . It is no loss of generality to take po < p; , and consequently 
Ky, < Ko. 

Proceed by observing X, , X2, and computing wu and ls(w.). If (1.1) is broken 
at the left, accept Ho:(U — u)/o Ky ; if (1.1) is broken at the right, accept 
H,:(U — yu)/o = K,. If (1.1) is satisfied, observe X;, compute u; and /3(us), 
and look at (1.1) with n = 3. If (1.1) is broken, take the appropriate decision; 
otherwise, continue taking observations, each time looking at (1.1), until (1.1) 
is broken. When (1.1) is broken, stop taking observations. The rationale behind 
the above procedure is discussed in [8], [1], and in Chapter 1 of [4]. 

It is important to know whether or not the WAGR test procedure as described 
above will lead to a decision at finite n for almost every sample sequence, that is 
with probability one, whatever (U — yu)/o = K in fact happens to be. Were this 
not the case, the WAGR test might not be a satisfactory statistical procedure, 
and doubt would be thrown on the accuracy of the Wald approximation to the 
probabilities of Type I and Type II errors. (This approximation says that the 
probability of Type I error is approximately a and that the probability of Type 
II error is approximately 8; its derivation depends on the knowledge that almost 
every sample sequence leads to a decision at finite n.) 

The proof to be given is rather direct. An alternative proof could no doubt be 
obtained by the method used by Barnard [1], in which the WAGR test is thought 
of as a limiting procedure in a sequence of tests each of which is a weighted se- 
quential probability ratio test in the sense of Wald (({9], Section 4.2.2), weighted 
by a distribution on population variance. Such a proof would depend on knowing 
that the weighted sequential probability ratio tests mentioned in the last sen- 
tence themselves reach decisions with probability one. Although Wald ([9], 


Section 3.2) states that under general circumstances such weighted procedures 


do reach a decision with probability one, so far as we know no exact statement o1 
proof of this has appeared in the literature. 
It is also possible that a proof might be had along the lines expounded by 
Nandi [7], but we are not able to follow Nandi’s arguments 
In order to prove that almost every sequence leads to decision at finite n, it is 
useful to rewrite l,,(u,) in terms of the variable v, = u,/+/n. We shall feel free 
to drop the subscript n when convenient. Then, letting » = n 1, L.(u,), the 
center of (1.1), becomes, in terms of v, , 
v+1 2) (pet et Hh —Vv + 1 v, K1) 
5— (1 — vn)(Ko — Ki) + h ———————— 
2 Hh,(—~/v + 1 0, Ko) 
| 
LS Sgt. 0D + Agen 


“ 


) dz 


It is known that for n (or v) fixed, l, as a function of v, is strictly monotone de 
creasing (see [5], and note that our v, is u of [5]). Alsovw, < 1. 



























THE WAGR SEQUENTIAL (-TES1 799 


Hence (1.1) is equivalent to 


(1.5) Bo ii. = R 


where A, and R, are acceptance (accept Hp») and rejection (accept H,) numbers 
for v, , functions of n, K,, and Ko. 
The first problem is to show that A, and R, have the same limit as n — ~. 
From this it will be possible to show that the probability of decision is unity. 
Note that A, and R, are the solutions respectively of 


B L(R,,) In 


— “a a 


(1.6) L.(A,) = In 

] 
Hence if we can show that there exists a number L such that lim,.. 1,(v) = 
2(—2«)asv is <(>)L, we may conclude that lim,.. A, = lim,..R, = L, 
the desired result. 


In order to show this it is essential to obtain asymptotic formulas for the 
integrals appearing in (1.4). This is our first step. 


2. Asymptotic formulas for the integrals of (1.4). This section is devoted to 
the statement and proof of the following Lemma: 
LEMMA 


s 


1 


/ z exp bo? 4 on/p + 1Kv) dz  V/2r(2/e)’ exp (42° 


*K {1 + (\/ KR? +4 — Ko)*/4)7°” 
as v — &. K and vare fixed and 
z= 3V/v + 1Kv + Viv + 1)K*r? + >. 


The right side of the above asymptotic relation may be replaced by 


Since K and v always appear together in this section as a product, we may as 
well set Kv = w. The integral of interest then is 
‘) a 1,2 ee 
(2.1) | 2” exp (—432° + 2zV/v + 1 w) dz. 
-0 


The maximum of the integrand occurs at (differentiate) 


(99 


mi 


=Avfytiwt Viv + 1)w? + » 


satisfying 
(2.3 F—ivfv+1lw—v=0 


and suggesting the change of variable y = z — z. Thus we may write 


(24) (2.1 / (y + 2)” exp [—Hy + 2° + (y+ DVW0 + 1 wl dy 





800 HERBERT T. DAVID AND WILLIAM H. KRUSKAL 


or, by virtue of (2.3), 


(2.5) (2.1) = 2’ exp (—42° + iV + 1 w) ir € cs | exp (—4y' 


The factors preceding the integral equal, again using (2.3), 
—v) = ‘e)” exp (z, 2). 


It will suffice then to show that the integral in (2.5) has a limit, as y > #, equal 
to the square root of the quantity in curly brackets in the statement of the 


lemma. 
a w+ w+ 4 — 
V y+] 


Note that 
and hence that 7 — « as»/v. Now rewrite the integral of (2.5) in the form 


i) 


2.6) | ¢g:(y) exp\—ty + r| in (1 - 4) — 4 dy, 


+— 


where ¢;(y) = 0 or 1 as y < or 2 —2Z. It is readily checked that when —zZ S 


y< @ 


Re 0 


Hence, for all y, 
: ’ 1 : j y = 
0 S ¢:(y) expy —ay 4 | in (1 +-—)]— >S exp (— 37), 


and exp (—}4y’) is integrable. Thus, by Lebesgue’s theorem, the limit of (2.6) 
as vy — © is the integral of the pointwise limit. But for each y 


lim ¢:(y) exp < — hy” + yp E ( 1+ t) _ 4 >= exp eae + lim . 


= exp (—3y [I 


Hence the limit of (2.6) is the square root of 


+4 
This gives the first asymptotic form of the lemma. The second is obtained by 


noting that 


) 


lim (2/7/14 2/(V w* + 4 u 


so that (2.6) is asymptotically equivalent to the square root of 


9 


2r 


L + 4y/(42) 





THE WAGR SEQUENTIAL (-TEST 801 


For Kv = 0, the above lemma provides an asymptotic expression for the 
Gamma function that is equivalent to Stirling’s formula. Similar methods have 
been applied to obtain inequalities on the Gamma integral. See, for example: 
J. R. Wilton, “A note on Stirling’s theorem,’’ Mathematical Notes (Edinburgh 
Mathematical Society), No. 28 (1933), xii—xiii. 


3. The limit of /,,. From the lemma we see that for large y and fixed v, (1.4) 
behaves like 


5 (Ko —_ K}) +» In (2) a 4 (2; — 


“ 0 

€ 4 

(3.1 a Z 1+ [Kov + /Kiv? + 4)" 
2 , a eae eee 


[Kiv + VK?v? + 4) 
where for z = 0, 1 
(3.2) z, = iW + [Ko + V ily + 1)Kiv? + ». 
For fixed v, the last term of (3.1) is a constant and the first three terms are of 
order ». 
Hence, as the following quantity is positive or negative, 
Kyo + V/ Ki v’? +4. 


: ‘te + 40°(K? — K2) 
Kov + / Kv? ote 4 


(Ki — Ki) + In 


(3.3) 
v , [ore 0 , [775 5 i 
+ 4 \Kav Kj v? + 4— Kov Kiv’ + 4; 
L, will have the limit + © or — o. (In obtaining (3.3), (2.3) is used.) 


Next we wish to show that there exists a number L (|L| < 1) such that (3.3) 
is positive (1, — ©) for v < L and (3.3) is negative (l, —- —o)forv > L. 


Such an L will obviously be unique. The derivative of (3.3) with respect to v is 


lity rare , — Fe ap V yr2 2 
(3.4) =((Kiv/ Ki? + 4 - Ko V Ki? + 4) + 5 (Ki — Ko). 
We shall show that (3.4) is negative, that (3.3) is positive at v = —1, and that 
(3.3) is negative at v = 1; these three facts suffice to show the existence of L 


with the stated properties. 
First, (3.4) is, for any fixed v, equal to f(Ki) — f(Ko), where 


(3.5) {(K) = {KVK?v? +44 Kv’). 
Since 
(3.6) A f(K) = |\/K%? + 4 + Kol’/V/K*? +4 > 0, 


and K, < Ko by hypothesis, (3.4) is negative for all v. 





802 HERBERT T. DAViD AND WILLIAM H. KRUSKAL 


Second, (3.3) evaluated for v = —1 is equal to g(Ki) — g(Ko), where 
(3.7) g(K) = -—3 K° + In[—K + VK? + 4] — } KVK? + 4. 


Since 


(3.8) = g(K) = —3[K + /K* +4] <0, 


and K, < Ko, (3.3) atv = —1 is positive. 
Third, (3.3) evaluated at v = 1 is equal to h(K,) — h(Ko), where 


(3.9) -A(K) = -4 K° + In[K + VRP 4) + KVR +4. 


Since 


(3.10) 7 h(K) = 3[K + VK? +4] > 0, 
a 
and K, < Ko, (3.3) at v = 1 is negative. 
This completes the proof that L exists and lies between —1 and 1. Although 
we do not need the information here, the sign of L may readily be found by evalu- 
ating (3.3) at v = 0. For there, (3.3) is just (Ko — Ki)/2. Hence 


if Ki; + Ko > 0, 0<L<1; 
if Ki + Ko = 0, L=0; 
if K; + Ko < 0, -1<L<0O. 


One might expect L always to lie between Ko/+/1 + K? and K,/+V/1 + K?, 
since these are the stochastic limits of v, under Hy and H, , respectively. How- 
ever, this does not seem to be the case in general. 


4. Proof that the probability of decision is one. It is now easy to prove that 
the WAGR test reaches a decision with probability one for every value of K 
save one. An immediate method is to note that v, = u,/+/n converges almost 
everywhere to K/+/1 + K?. Hence for every value of K except one, v, will, 
for large enough n (depending on K) have crossed a decision limit. Were the 
probability of no decision positive, this would contradict the strong conver- 
gence of v, . The one exception is the case for which K/+/1 + K? = L. 

An alternative argument follows that of Cox [2]. We note that 


(4.1) ue — VnK/V1 + © 
V(1 + 4K)/(1 + K?) 
converges in distribution to the unit-normal distribution. This follows from the 
delta method theorem applied to wu, in terms of sample mean and variance for 
the (U — X;)’s. 
Hence the probability that v, lies in the interval [R,, , A,], which is the same as 
the probability that w, lies in the interval [4/nR, , ~/nA,], becomes arbitrarily 


























THE WAGR SEQUENTIAL (-TEST 


close, by uniformity of convergence, to 


(42) ( VnA, — VnK/V1 +e) ~— (¥aFe- eet e) 


a 


\ Vl + $K)/(0 + K?)3 V (i + $K%)/(1 + K*)! 

Now, provided L is not Kx/1 + K?, (4.2) — 0, for both terms together approach 
either one or zero. But the probability that decision is not reached by n is S the 
probability that R, S v, S A,, and this has the limit zero. 


5. Special argument if A is such that the common limit of A, and /,, is 
K//1 + K®. The above arguments fail if L is K/+/1 + K?, and it then becomes 
necessary to look at the speed of convergence. 

If we differentiate (1.4) with respect to v, , we obtain 

| 2”* exp (—42° + 2eW/y + 10, Ki) dz 
Ky Vy a l $$$ —— 
I 2” exp (—32° + 2v/v + 10, Ki) dz 


minus a similar quantity with K, replaced by Ko . Following the same argument 
as that in Section 3, but with Z replaced by 2, 


(5.2) = 3VW + 1K, + V404+ 1K +04 0D, 


where 
(5.3) ze — ix/y + 1Ko, — (v + 1) = 0, 
we find that (5.1) for large v behaves like 
(5.4) Kivy + 14h (2) exp [3(4 — 2) — 1, 
“1 
where 2, is Z for K = K, . Now, letting w, = v,Ki, 
(5.5) lim (2) = exp [2(wir/w? + 4 + wi + 4)" 
and 
J 9 2 Ww 
(5.6) lim (2; — 27) = 1+ 


Vw? + 4° 
It follows that the derivative of (1.4) with respect to v, , divided by v + 1, 
has the limit 
) Kiwi + Vad Fal — 1K + Vat Fa, 
where wo = v,Ko. Call this limit A(v,); we shall assume that A(L) < 0 and 
demonstrate this later. 
By the law of the mean 
(| —Bl—a 
In 
rs 
(A, — R,)n 


-é 


(9 


Jue x [Derivative of (1.4) at v, = 4], 
nm 





804 HERBERT T. DAVID AND WILLIAM H. KRUSKAL 


where @ lies between A, and RF, . Now A(v,) is negative in some interval about 
v, = L, by the assumption of the last paragraph and the continuity of (5.7). 
Hence the right side of (5.8) has the negative limit A(L) as n — «. Hence 
lim n(A, — R,) = constant and lim v,(A, — R,) = 0. Thus the argument at the 
end of Section 4 applies. 

It remains to show that A(L) < 0. This follows directly from the observation 
that the partial derivative with respect to K of K[Kv + »/K%? + 4] is positive. 


6. Further comments. The question of whether or not a decision is reached with 
probability one may be asked, not only about the WAGR test itself, but also 
about each of the several approximations to it that have been suggested. 

For example, Wallis ((4], Chapter 1) suggested a sequential procedure that 
approximates the WAGR test in a manner described by Kruskal [6]. It may be 
shown relatively easily that the Wallis procedure leads to a decision with prob- 
ability one. Other approximations have been suggested by Kruskal [6], but for 
them the probabilities of decision have not been investigated. Rushton [8] 
suggests three approximations to the WAGR test, but we know of no theoretical 
work on their properties. 

It may be noted that the asymptotic expressions of this paper suggest still 
another approximation to the WAGR test, namely the use of (3.1) as an ap- 
proximation to /, . 

Although we have shown that A, and R, converge to the same limit asn — ~, 
nothing has been said about monotonicity of approach. One feels that A, should 
approach L from above and R, from below monotonely. However, consideration 
of the crossing point of 1, and 1,4; suggests that this cannot be true for all a’s 
and #’s. It may very likely be true, for given a and 8, when n is sufficiently large; 
and again it may be true for all n if a and £ are restricted. These seem to be open 
questions. 

We feel that the conclusions of this paper should become a special case of 
some much more general results. Perhaps when they do, questions such as those 
of the above paragraph will become more easily handled. 


7. Acknowledgements. We wish to acknowledge warmly the aid given us by 
M. D. Kruskal (Princeton) in obtaining the asymptotic expressions used in this 
paper. Other persons who have been helpful are M. Rosenblatt (Chicago), D. 
Lindley (Cambridge), R. R. Bahadur (Chicago), and J. Kiefer (Cornell). 


REFERENCES 


fl] G. A. BARNARD, he frequency justification of certain sequential tests,’’ Biometrika, 
Vol. 39 (1952), pp. 144-150. See also an addendum to this paper in Biometrika, 
Vol. 40 (1953), pp. 468-469. 

[2] D. R. Cox, ‘‘Sequential tests for composite hypotheses,’’ Proc. Cambridge Philos. Soc., 
Vol. 48 (1952), pp. 290-299. 

[3] Owen L. Davies (ed.), The Design and Analysis of Industrial Experiments, Oliver and 
Boyd, London (1954), Chapter 3 and Table L. 


con 





THE WAGR SEQUENTIAL (-TEST 805 


[4] CaurcniLtt E1rsennart, Miitarp W. Hastay, W. ALLEN WaLuts (eds.), Selected Tech- 
niques of Statistical Analysis, MeGraw-Hill, New York, 1947. 
WILLIAM Kruskal, ‘““The monotonicity of the ratio of two noncentral ¢ density func- 
tions,’’ Ann. Math. Stat., Vol. 25 (1954), pp. 162 165. 


WILLIAM KrusKAL, ‘‘Approximate sequential tests for hypotheses about the proportion 


of a normal population to one side of a given number”’ (Abstract), Ann. Math. 

Stat., Vol. 26 (1955), pp. 150-151. 

S. K. Nano, ‘Use of well-known statistics in sequential analysis 
1948), pp. 339-344. 

[8] S. RusuTron 


” Sankhyd, Vol. 8 


, “On a sequential t-test,’’ Biometrika, Vol. 37 (1950), pp. 326-333. 
(9] ABRAHAM WALD, Sequential Analysis, John Wiley and Sons, New York, 1947. 





THE USE OF GENERALIZED PROBABILITY PAPER FOR CONTINUOUS 
DISTRIBUTIONS! 


By HreRMAN CHERNOFF AND GERALD J. LIEBERMAN 


Stanford University 


1. Summary. The problem of plotting on probability paper is extended to 
continuous distributions which are completely specified except for scale and 
location parameters. Necessary and sufficient conditions are given to ensure 
that the plot which is optimal for estimating the scale parameter is also optimal 
for estimating each of the percentiles. 


2. Introduction. In a previous paper [2], the question of how to plot a sample 
from a normal population on normal probability paper was raised. The main pur- 
pose of that paper was to illustrate that the optimal construction of a graph de- 
pends on the use to which the graph would be put. In particular, the best plot 
for estimating the mean and standard deviation was discussed. Although the 
proposed method of plotting was considered to be merely an illustration of the 
above-mentioned principle, considerable comment about its usefulness was 
aroused. Therefore, it was decided to extend the problem to a general continuous 
distribution with finite variance which is specified except for a location and scale 
parameter. Special examples of interest are the exponential and extreme-value 
distributions. 

The optimization methods used in this paper are applications of the method of 
Lagrange multipliers, and they essentially reproduce some of the results given 


~ 


by Downton [4][5], Godwin [7], Lloyd [11], and Sarhan [12][13]. 


3. Preliminaries. Let x; , t2, --- , x, be the ordered observations on a con- 
tinuous chance variable X, where 


(1) X=ut+oaY 


and where Y has mean 0 and variance 1. By a suitable monotonic transformation 
of the vertical scale, it is possible to transform the c.d.f. of Y and of all linear func- 
tions of Y to straight lines. In fact, this is accomplished by plotting the p per- 
centile at a distance v = F'(p) above the z-axis where F is the c.d.f. of Y. 
We shall use the term “plot’’ to represent a choice of n numbers p; , po, -** , Pn 
(or the corresponding v’s, v; , v2 , --- , Un) Which are attached to 7; , 2%, --- ,2n, 
respectively. It will be understood that the use of a ‘“‘plot”’ corresponds to the 
plotting of the points (x; , pi), (v2, pe), -*- , (an, Pn). Three examples of such 
plots are (2; , 1/n), (a2, 2/n), --- , (an, n/n); (a1, 1/(m + 1)), (ae , 2/(n + 1)), 
- , (xz, , n/(n + 1)); and (a, , 1/2n), (a2 , 3/2n), --- , (an, (2n — 1)/2n). It 
will frequently be more convenient to consider the points in the linear scale, i.e., 
the points (2; , 11), (w2 , v2), --- , (@, , v»). In the first example mentioned above, 


Received May 31, 1955. 
1 Work done under the sponsorship of the Office of Naval Research 


806 


































GENERALIZED PROBABILITY PAPER 807 


the plot in the linear scale is represented by (2, F*(1/n)), (x2, F-'(2/n)), --- 
(x, , F-‘(n/n)). Since there is no obvious rationale for choosing a plot, there 
arises the problem of selecting an “‘optimum”’ plot. 

A “plot” is to be used to estimate the scale parameter or the percentiles of 

the X distribution in the following fashion. Visually fit a straight line through 
the n points. We shall assume that this fitted straight line is a good approxi- 
mation to the line which would be obtained by minimizing the sum of the 
squares of the horizontal deviations of the points to the line. We take horizontal 
deviations because the x, are the random variables. Suppose that this fitted 
straight line is given by 
(2 x=a-+t be. 
An estimate of the standard deviation, o, is given by b. If it is desired to estimate 
the po percentile, we may use 2 = a + bvo, where vo = F (po). Graphically, 
these procedures are described as follows. To estimate o, take the differences of 
the abscissas on two points of the line where the ordinates are the c.d.f. values 
corresponding to u and w + o. Since these c.d.f. values are F(0) and F(1), the 
ordinates in the v scale are 0 and 1. To estimate the po percentile, take that value 
of xz where the line has ordinate po (vo in the linear scale). 

The problem of estimating » may be regarded as that of estimating the po 
percentile, where po = F'(0). One can treat the mode or other location paiameters 
similarly 

To each plot there are associated estimates of o and the percentiles. If we as- 
sume that the visually fitted straight line is actually the least-squares line, these 
estimates are of a special type. In fact, 


n 


D «(v; — 8) 
(3) payee. 
pa (v; — p)° 
t=! 
(4) a = £ — bi, 
> x, (0; — b) 
(5) to = 2+ (yw — 0) - = , 
z. (v; ~ 5)” 
i=l 


The estimate of ¢ is a contrast in the ordered observations (i.e., a linear function 
of the z; , the sum of whose coefficients is zero). The estimate of xo is a weighted 
average of the ordered observations. Let 


v; — BD 
/ = Be. 
> (0 — 9) 
cai 
(6) w; = (vo — d)u;, 


B; = E(y;), 


808 HERMAN CHERNOFF AND GERALD J. LIEBERMAN 


where y; is ith ordered observation of a sample of n observations on Y, and let 
(8) oi; = E{(yi — Bid(y; — B;)}. 


Let u, w, 8, and x be column vectors whose elements are u;, w;, 8; and 2;, 
respectively. Let = = || ¢;; || and let e be the column vector, all of whose ele- 
ments are 1/n. 

Then we may write 
(9) ¢ = wz, 


(10) Lo = (e + w)’z. 


Note that the definition of u imposes the sole restriction e’u = 0 on wu. Similarly, 
the definition of w imposes the sole restriction e’w = 0 on w unless v9 — 6 = 0, 
in which case w = 0. 

The following relations hold: 


(11) E(x) = nue + of, 
(12) B’e = 0, 


(13) 
u (vo — d)w; 
(14) — = ——. 
uu w'w 
We may also remark that = is positive definite. 
4. Estimation of c. In this section we derive the plots which yield the minimum 


variance unbiased estimate of o, and the estimate of o with the minimum second 
moment about c. For the first we minimize 


(U.S.D.1)° E(é — co)’ = o [u’'Zu) 


subject to the restrictions 


We obtain 
a ALB + Aol 
or 


(U.S.D.2) u = TS + De, 


2 In Section 4 all the equations that are prefixed by U.S.D. indicate that they are appli 
cable to the case of unbiased estimation of the standard deviation. All the equations that are 
prefixed by B.S.D. indicate that they are applicable to the case of biased estimation of the 
standard deviation. 





GENERALIZED PROBABILITY PAPER 809 


where \; and ); , the Lagrange multipliers determined by the above restrictions, 
are 


Isl 
(US.D.3) = 
A 
Io—l 
(US.D4) ho ee 
A 
(15) A = (B/="'B)(e’Se) — (’D"e)’. 
Thus, 


E(é ee a) = o’(u’Su) = ou’ (AB + Ave) = dhe 9 


; a » oo (Se) 
(US.D.5) B(@ — o) = 2 


Now let us derive the equations for the plot which yields the estimate of o 
that has the minimum second moment about ¢. We minimize 


(B.S.D.1) E(é — oc) = o[u’Su + (Bu — 1)'| 


subject to the restriction e’u = 0. We have Zu + (8’u — 1)8 = de, where d is 
the Lagrange multiplier. It then follows that u’Zu + (6’u — 1)6’u = 0, whence 
E\(é — ce) }/o = wu + (pu — 1% = 1 — Bu. Now, 

u=rD"e + (1 — Bu)= 8, 


1 


he’S ‘e + (1 — Bu)e’= B= 0, 


ABS *e + (1 — Bu)s’S"*B = BuH=l—-(l— B’u), 


—¢3"s : ws 
X Ae? 1— pu= a* 
(16) A* =A+ =e, 
(BS.D.2) u= a [(e’a*e)='B — (e’'D~'B)=*e). 
and 
(BS.D.3) E{(¢ — o)*} = a ¢, 


It might be noted that the u vectors for the unbiased and biased estimates are 
proportional. In fact, the biased estimates could easily be derived from the un- 
biased by the same arguments as those used by the authors in [2] and by Good- 
man in [8]. 

It should be noted that the value of was immaterial. Geometrically, this 
means merely that raising or lowering the line does not change its slope. 





810 HERMAN CHERNOFF AND GERALD J. LIEBERMAN 


5. Estimation of the » percentile. In this section we shall discuss the plot 
which furnishes the minimum variance unbiased estimate of the po percentile, 
z, and also the plot which furnishes the estimate with the minimum second 
moment about 2» . 


For the first problem we minimize 


(U.P.1)° E{ (%o — xo)"} = o'(e + w)’Z(e + w) 


subject to the restrictions 

e’'w = 0, B’w = wv. 
We have 

Z(e + w) = AB + ve 
or 
(U.P. 2) w= B+rr e€ —e, 
where \; and ), are the Lagrange multiplier given by 

= 

(U.P.3 


and 


(U.P.4) 


Thus, 
E{(z = xo)" } o'(e a w)’ (8 + A2e ) = o (Awe “+e eo /n) 
or 
(U.P.5) E\ (i — %)°} = * (° ae we) = ¢ di we). 
A vt 


For the problem of minimizing the second moment of Z% about 2», we mini- 
mize 


(B.P.1) E{\(# — x0)°} = o'[(e + w)/Z(e + w) + (B’w — v)"] 
subject to e’w = 0. We have 
r(e + w) + (B’w — v9)8 = de, 
(e + w)'Z(e + w) + (B’w — v)(8’w) = A/n, 


3 In Sections 5 and 6 all the equations that are prefixed by U.P. indicate that they are 
applicable to the case of unbiased estimation of the po percentile. All the equations that 
are prefixed by B.P. indicate that they are applicable to the case of biased estimation of the 
Po percentile. 








GENERALIZED PROBABILITY PAPER 811 


whence 


E{(& — x)"} = o'[(e + w)/Z(e + w) + (Bw — %)’] 
= a [A/n + vo(vo — B’w)). 


Now, 
e+ w =z e + (uo — B’w)= 8, 
(B.P.2) me na 
he’s e + (vo — Bwe’X B = 1/n, 
AB’E~"e + (vo — B’w)p’E'B = B’w = vm — (vo — B’w), 
vo (e’D*e) '— : (g’>~"e) 
(B.P.3) vo — B’w = — gg ncn 
iol 
BE B+1_ | ets 
(B.P.4) \=- —=-—— 
Thus, 


o [ez "B+1 we’ 'B 


(B.P5) E(t — x)"} = are “aaeer ae vo(e’Z"e) — ~ xe | 


or 


(BP6) E{(é— a} =% | (2 _ we ) = (2 _ we) + I. 
A n n n* 


In each of these problems the value of 6 is not determined. Geometrically, 
this means that if the vp — v; are multiplied by a factor, the position where the 
fitted line has ordinate vo will not be affected. Since 6 is the only undetermined 
element of the optimal plot, it follows that the optimal weighted average of the 
ordered observations for estimating 2» is unique for both the biased and unbiased 


cases. 


6. Invariance. In this section we shall study the conditions which imply that 
the optimal plot does not depend on the percentile being estimated. In fact, we 
shall call a plot an invariant optimal plot if for each 2 it yields an estimate Z 
which minimizes E{(% — 2)°}. We shall call a plot an invariant optimal un- 
biased plot if for each zo it yields an unbiased estimate #) which minimizes 


E|(#o — 20)'). 


We shall use the terms optimal weighted average for xo and optimal unbiased 
weighted average for xo similarly. 





812 HERMAN CHERNOFF AND GERALD J. LIEBERMAN 


Lemma |. /f there are real numbers c and k such that Le = ce + kf, it follows 
that 


(17) ee = e+ kp, 
(18) B’S"e = —ke’="B, 


(19) me a1 Bess, 
nm 


] 


i 


+ p’="'B ? + a 
if) ni 


(20) 


gy! 


A oe B ; 


Proor. We apply 


to obtain c = 1. Premultiplying 
e= 2 e+kz's 
by #’ and e’, we obtain (18) and (19). The remaining equations follow by sub- 
stitution. 
THEOREM 1. There is an invariant optimal unbiased plot if and only if Le = 


e + kB. In that case the invariant optimal plot is unique, has 6 = 0, and is optimal 
for the unbiased estimation of o. Also, 


2 {(1 + nuk)’ vo | 
{ = 


(U.P 6) E\% — x\* =o S . 
n B’=-B) 


\ 


Proor. Suppose that there is an invariant optimal unbiased plot. Then (see 
(U.P. 2, 3, and 4)), 


where 


i(e's™ 


A 


Applying (14), we have 


- (vy, — 8) (v9 — 0)” as 
_ [(v dw) + w)’[v9 — dw + w®] 








GENERALIZED PROBABILITY PAPER 813 


o 


A necessary and sufficient condition that the plot be invariant is that w” = 0, 
which is equivalent to Ze = ce + kB or Le = e + kB by Lemma 1. But then 
= | 
~ (82°68 — tex 8) 
Seek = | or d= 0 
A 


and the plot is unique and coincides (see (U.S.D. 2, 3, and 4)) with one which is 
optimal for the unbiased estimation of ¢. Now suppose only that Se = e + ks. 
Let 5 = 0. Then, w” = 0 and 


( tel Isl \ 
(U.P.7) w = v9 = “re — =e) 


furnishes an invariant optimal unbiased plot. Substituting in (U.P. 5), we ob- 
tain (U.P. 6). 

THEOREM 2. There is an invariant optimal biased plot if and only if Le = e + kp. 
In that case the invariant optimal plot is unique, hast = k, and is optimal for the 
estimation of o. Also, 

|e=s) (2 + om) +2 + 4] 
(B.P.7) E{% — 2)"} = o <— = aenene 


————-——_—— ; 


@zto(t+e)+2 | 
n nm ) 


\ 


Proor. As in the proof of Theorem 1, we find that a necessary and sufficient 
o.° . . . 2) 
condition that a plot be invariant is that w? = 0, where now 


i 1 - 
b(e’"D"e) — = (e’D™’B) 
n w 


-1 
Ww = ee ere pee ~ B 


tpl 
gE B+1 _ syy-tg 
n 
+ — 


A* 





Die — @. 


Hence, if a plot is invariant, Ye = ce + kB = e + kf. But if w. = 0, 


Isl 
Bz B+1_ je’>'B 
n 
ae -=1] or =k, 





and the plot is unique and coincides (see (B.S.D. 2)) with one which is optimal 
for the estimation of ¢. Now suppose Le = e + kf. Let 5 = k. Then, w” = 0 
and 


ty] P fy] 
(B.P.8) w = (v% — k) |: = >p- : a | 





furnishes an invariant optimal plot. Substituting in (B.P.6), we obtain (B.P.7). 





814 HERMAN CHERNOFF AND GERALD J. LIEBERMAN 


CorRoLuary |. A necessary and sufficient condition that & is the optimal unbiased 
weighted average for estimating u is that Le = e + kp. 

Proor. The sufficiency is trivial. For the necessity, we note that if Z is opti- 
mal, w = 0 and hence Ye = AB + Ae, and by Lemma 1, Le = e + kf. 

CoroLuary 2. A necessary and sufficient condition that € is the optimal weighted 
average for estimating the po = F (vo) percentile is that Le = € + v8. 

The proof is similar to that of Corollary 1 and is omitted. Note that as a par- 
ticular case, Z is optimal for estimating yu if and only if Ye = e. 

The following corollaries are of some interest because they relate notions like 
sufficiency, completeness, and min-max estimates with the covariance matrix 
of the ordered observations. On the other hand they may not yield many appli- 
cations even if they were refined in the more or less obvious ways. 

Coro.uary 3. If & is a function of a sufficient statistic for (u, o) whose family 
of distributions is complete, then Le = e + kB. 

Proor. Under the above assumptions, it follows that Z is a minimum variance 
unbiased estimate of yu. (See [1] and [10].) By Corollary 1, the result follows. 

Corouuary 4. If for fixed o, — is a min-max estimate of uw with respect to the 
quadratic loss function, then Le = e. 

Proor. Let ¢ be a weighted average of the ordered observations. Its risk is 
given by 

r, = Ef(t — py} = ae, 
where a, is independent of « and c. Similarly, 


od Gan ‘ : 2 
re = E{(é — w)?} = —o’. 
n 


If < is min-max, we must have a, = 1/n. Corollary 2 then yields our result. 
This proof would apply equally well if it were given that Z is a min-max esti- 
mate among invariant estimates with respect to the loss function (¢ — y)°/o°. 
Results similar to the above corollaries have been obtained by Lloyd [11] for 
the special case of symmetric distributions. In that case if Le = e + kB, k is 
necessarily zero. 


7. Examples. 

EXAMPLE 1. The exponential distribution. Let us derive the means and co- 
rariances of the order statistics of a sample of n independent observations 
with density 

f(z) r for 2 = 0, 
(22) 


- 0 forz < 0. 


Following the method of Epstein and Sobel [6], we note first that if Z has the 
above exponential density, then the conditional density of Z — a, given Z 2 a, 
also has this density. If we let z;, be the 7th order statistic from a sample of size 
n, it follows that 





| 


GENERALIZED PROBABILITY PAPER 815 
have the same joint distribution as the order statistics from a sample of size 
n — 1 and are independent of z,, . Then we might write 

Zin = Zin + 2i-1,n—1 for t= 2, de ** , n, 


where 2;~1,n—-1 is independent of z:, . Also 


it 


E{zin} = Elz} + Elziana} fort = 2,3,---,n, 


Feintin — Taintin fors = 2,3, ---,n, 
and 
Cccatin = Cnarna T Ces-ra—wj-ra-1 106 %,j = 2,3, --- , 2. 
Now, 
Iz. >a} = PiZ>a}"=e™, 
which implies that E{z,,} = 1/n and e = 1/n’. It follows that 
Eta) = + = irs: 3 +1 
and 
Faintin = ne i a a ee j: 
n* (n — 1)? (n — 2+ 1)? 
‘The chance variable Z has mean 1 and variance 1. We normalize to Y = Z — | 
and it follows that 
l 1 1 
ad OTE gett: aoeee 
(24) Cp eH ne Heo + — isi, 
n (n — 1) (n —21+ 1) 
and therefore 
(25) Ye=e+ ~ B. 


It follows from our results of the preceding section that there are invariant 
optimal plots for both the biased and unbiased estimation problems, that Z 
is the minimum variance unbiased estimate of 4 among the weighted averages 
of the ordered observations, and that Z is not the optimal weighted average for 
estimating y» if bias is permitted. In fact, Z is the optimal weighted average for 
estimating the 1 — e~"*°'”! percentile. 

In this particular problem it is easy to show that 


(26) e's = (n,0,0,0 --- , 0), 


(27) ga = (1 — vn’, 1,1,1,---, 1). 





816 HERMAN CHERNOFF AND GERLAD J. LIEBERMAN 


For the optimal unbiased plot, we have 
(28) 


(29) 
(30) E{(é 


and 


2{(1 +m)” , v% = | 
o eae 


\ 


(31) Ef (a — 2)*} —_ 
n n(n — 1)) 


On the other hand, for the optimal biased plot, 
(32) 


(33) == ( —e Poe _— pg Uttl/n) +1/(n 


he 
(34) 


and 


(35) E\(Z — 2)*} = a g + %)° — a + 200) + |. 
n n n? 

In this somewhat extraordinary example all order statistics except the first 
are plotted at the same probability level. Of course, this property, imposed by 
our criteria for graphing, makes the graph useless for the purpose of testing 
whether the distribution is exponential. In this case there seems little to be gained 
by using the above plots instead of algebra. 

From another point of view, it is not surprising that these results appeared. 
In fact for the exponential distributions with unknown scale and location param- 
eter, a sufficient statistic for (u, o) is given by (x, , Z). It is not surprising, then, 
that the above plots lump all except the first observation at one level. 

In fact, it can be shown that the family of distributions of (x; , Z) is complete, 
and then Corollary 3 gives us the fact that Ze = e + kp. 

EXAMPLE 2. The normal distribution. It is known [14] that Z is a min-max esti- 
mate of u. It follows that Ze = e. Since the normal distribution is symmetric, 
Ye = e can also be inferred from Lloyd’s paper [11]. It is of interest to note that 
the result was also obtained by Jones [9] by a method which resembles this 
approach in that it uses the fact that Z is a good weighted average of the ordered 
observations for estimating yu. This result can also be obtained as follows. The 
differences of unordered observations are independent of Z and z; — 2; is a func- 
tion of these differences. Therefore, Z is uncorrelated with (2; — 2z,;) and 


j 7% 
Oz;2 — O2z;2 = (1/n) Sr (o2;2, — Cz;z,) = 0. 


The sum of the elements of rows of the covariance matrix are the same for each 
row. Hence this sum is one and, finally, Ze = e. 





GENERALIZED PROBABILITY PAPER 817 


Tables have been presented in [2] for the optimal plots using normal probabil- 
ity papers for sample sizes up to n = 10. Thes will be extended [3] to n = 20. 

Note that Ze = e implies that 8’2""e = 8’e = 0. Hence the »; for the optimal 
biased and unbiased plots are proportional to =~’. This relation also implies 
that e is a characteristic vector of 2 corresponding to characteristic value 1. 
Furthermore, since all the elements of = are positive, 1 is the largest characteris- 
tic value of > 


8. Concluding Remarks. The problems treated in this paper have been rela- 
tively simple. There are a number of questions which are more difficult and are 
as yet unsolved. 

First, it would be very interesting to know under what conditions one can be 
insured that the optimal v; are increasing. It would be embarrassing to propose 
that the smallest observation be plotted at a higher p-level than the second 
smallest. One conjecture is that Ze = e + kB implies the desired result. 

Second, for distributions which correspond to chance variables which are 
bounded, the range of » corresponding to values of p between 0 and 1 is similarly 
bounded. Under what conditions will the optimal plot involve only values of v 
which correspond to values of p between 0 and 1? 

Third, what is the class of continuous distributions for which Ze = e for all 
n? For n = 2, all symmetric distributions have this property. Could it be that 
only the normal distribution has this property for all n or even for any n > 2? 

Fourth, what are the asymptotic properties of the optimal plots as n - «? 

Finally, the important question of how to plot in order to furnish a test that 
the distribution is in the given family is still untreated. 

We conclude with the following remark. Suppose that instead of fitting the 
least-squares line to the plotted points, we fitted a modified least-squares line 
where the points had specified weights not all equal. Then it can be shown that 
there is a one-to-one correspondence of plots such that equivalent estimates are 
obtained by the two methods. 

REFERENCES 
[1] D. BLackwe.t, “Conditional expectation and unbiased sequential estimation,’ Ann. 
Math. Stat., Vol. 18 (1947), pp. 105-110. 
1. CueERNoFF AND G. J. LieperMan, ‘Use of normal probability paper,” J. Amer. 
Stat. Assoc., Vol. 49 (1954), pp. 778-784. 


(3) H. Cuernorr ann G. J. Lizperman, ‘Engineering applications of normal probability 
paper,’’ unpublished paper. 


tw 
st 


[4] F. Downton, “A note on ordered least-squares estimation,’’ Biometrika, Vol. 40 
(1953), pp. 457-458. 

[5] F. Downton, “Least-squares estimates using ordered observations,’’ Ann. Math. 
Stat., Vol. 25 (1954), pp. 303-316. 


[6] B. Epsrern anp M. Soset, “‘Life testing,’’ J. Amer. Stat. Assoc., Vol. 48 (1953), pp. 
486-502. 

[7] H. J. Gopwin, “On the estimation of dispersion by linear systematic statistics,”’ 
Biometrika, Vol. 36 (1949), pp. 92-100. 

[8] L 


. A. Goopman, “A simple method for improving some estimators,’’ Ann. Math. Stat., 
Vol. 24 (1953), pp. 114-117. 








818 HERMAN CHERNOFF AND GERALD J. LIEBERMAN 


(9] H. L. Jonzs, “Exact lower moments of order statistics in small samples from a normal 


distribution,’’ Ann. Math. Stat., Vol. 19 (1948), pp. 270-273. 
[10] E. L. LeuMann anv H. Scuerrs, “Completeness, similar regions, and unbiased estima 
tion. Part I,’’ Sankhyd, Vol. 10 (1950), pp. 305-340. 
Lioyp, ‘Least-squares estimation of location and scale parameters using order 
statistics,’’ Biometrika, Vol. 39 (1952), pp. 88-95. 
. SARHAN, ‘“‘Estimation of the mean and standard deviation by order statistics,”’ 
Ann. Math. Stat., Vol. 25 (1954), pp. 317-328. 
SARHAN, “Estimation of the mean and standard deviation by order statistics, 
Part II and Part III,” Ann. Math. Stat., Vol. 26 (1955), pp. 505-511 and pp. 576- 
592. 
{14] J. Wo_rowr7Tz, ‘‘Min-max estimates of the mean of a normal distribution with known 
variance,’’ Ann. Math. Stat., Vol. 21 (1950), pp. 182-197. 


[11] E. A. 
(12) A. E 


[13] A. BE. 





THE MODIFIED MEAN SQUARE SUCCESSIVE DIFFERENCE AND 
RELATED STATISTICS’ 


By Seymour GEISSER 
University of North Carolina 


1. Introduction. In estimating the variance of a normal population one uses 
the statistic s* = (n — 1)" D021 (x; — #)* because of its optimum properties. 
In certain cases where there is an indeterminable trend in the data, it has been 
thought useful to estimate the variance by another statistic, namely the mean 
square successive difference, the mean of the squared first differences, studied by 
J. von Neumann et al. [5], which eliminates a good deal of the trend and under 
some conditions is less biased than s’. An explicit form of the exact distribution 
of this statistic seems, at least for the present, too difficult to obtain. However, 
by applying a device analogous to one used by Durbin and Watson [1], that is, 
by dropping from the mean square successive difference the middle term for an 
even number of observations and the two middle terms for the odd case, we find 
that the quadratic form has double roots, thus enabling us to obtain exact dis- 
tributions in terms of elementary functions. In addition we define analogues of 
the Student ¢ and the Fisher F using similarly modified statistics and derive 
their exact distributions when the observations are independent. 

The results of this paper are mainly the exact distributions of these statistics 
and were given at the April 1955 meetings of the Institute of Mathematical 
Statistics. A short while after, these same results, independently derived, were 
published by A. R. Kamat [3]. Since Kamat has already published the exact dis- 
tributions of these statistics and the motivation for them, it would be inappro- 
priate to rederive them here; hence we shall only state the results and give that 
material that Kamat had not considered in his paper. 


2. The modified mean square successive difference. Let x; be N(0, o*) and 


let 2) , --* , Lem be independent. We define the modified mean square successive 
difference to be 
2m—1 
2 —1 —1 2 
(2.1) 59 = 4 (m — 1) 7 (tin, — 2)”. 
tel 
im 


The exact density of do is 


nl auf 
9 4”™*(m — 1)" a1. oker 
Pn(bo) = . 2», (—1)" ain? = 
mo bol m 
(2.2) 2 ker 
(m — 1) sec’ — 
cos” * oe exp = 
2m — 20° 


Received June 28, 1955. 

! Work under contract with the Office of Naval Research NR 042 031, for investigations 
in statistics and probability at Chapel Hill. Reproduction in whole or in part is permitted 
for any purpose of the United States government. 


819 





820 SEYMOUR GEISSER 
and the cumulative distribution function is 


m—1 
2 ce ae 41. okr 
P,,(83) = 1 — 2°"*m™* > (—1)*" sin? 2 
m 


ken 


1) sec’ ie 
- cos’ * L exp| — —e 
2m 20° 
We shall also show that this statistic is asymptotically normal by showing 
that the mean square successive difference 6 is asymptotically normal using the 
central limit theorem for dependent random variables of Hoeffding and Rob- 
bins [2]. 
Let 


n—l 


2(n — 1)8 = > (t41 — 


ial 


and let 
w; = (tin — 2)* —: 
Now &w; = 0 and 
0 ifj > 1, 
(2.4) Swwi4; = (20° if 7 l, 
8e° = iffj = 0. 


The set wi, We,°*:, War is a 1-dependent sequence; i.e., the set 
(w, , We, *** , Wr) is independent of the set (ws. ,--- , Was) for r = 1, 2, --- 
n — 2. Let 


’ 


2 > ° 
(2.5) P; = Swia1 + Ww; fort = 1, 2. 


Therefore P; = 12c* for all 7. Since all the other conditions of the Hoeffding- 
Robbins theorem are satisfied, we have for every real a and b, 


lim Pr [o? + 3*"a0"(n — 1)° < & < o& + 3'*bo'(n — 1)"] = F(b) — Fa), 


J 
nwo 


where F(x) is the cumulative normal density function, 


ez 


(2x) | oa 


—2 
Since 8 is in reality the sum of two &”s, it is also asymptotically normal. 
3. Moments. We shall now evaluate the moments of 69 by integrating (2.2). 


Hence 


e2r 2ry2m—3 +r 


> —-1 
So =o 2 rim (m — 1) 








ts > 
MODIFIED MEAN SQUARE $21 
where 
— k kr kr 
’ \k—1 2(m+r—2 - 2h 
gad. §-3f"* oof i 
kel 2m m 
m—1 kr m—1 k 
k-1 2(m+r—1) c— 2(m+ UT 
a 42, (—1)""* cal & — 4) (=1)"* oa = 
kewl 2m bent 2m 
(3.1) = 4(S; — 8); 
m—1 kr 
' k—1 2(m+r—1) Ke 
S, = Zz (—1)"" cos = 3 
kenl 2m 
= ( kr 
S. = me (—1)** cos*™*” —. 
kenl 2m 


Now according to Schwatt ([4], p. 222), 


+ 1 (— | aie cos (2m — lar 
gir "SS" ( 2m — Or — 2 ee. 
2 on ff ( ) ee |, 
1 : ahem i te 1+ nee 
—_ ae 2m 


awl 





2m + 2r — ?"; 


( \m—1 1—2m—2r 
+(—(-)™2 Po 


similarly, S; is the same as S, with m + r in place of m + r — 1. 
Now for m even, 


( 
aT 
trl ‘ cos (2m — 1) — 
Oo l1—m-r 2m a 2r _ 2 2m 
1 == 4 \ 1 a 
| gaat \m™+r—-l—a ar 
| cos ye 


2m + 2r — 2 
+( 7) 


m+r-—1 


+r—l1 ‘ ‘ = | 4 
sem Fant me )+ 2 aor * 
) m+r-—-l-a m+r-—-l-a 


a=l az0 


\ 
| (2m — 1) = || 
[ =e | 


= 4?" + Sis), 





822 SEYMOUR GEISSER 


where S;; is equal to the first sum above and Sj, is equal to the second sum above. 


Now 
"St ( 2m + 2r— 2 \ AL Somter—s2 2m + 2r — */ 
~ x oe ae wae sd |. 


Further 


ar 
cos (2m — 1) — 
2m / , 
~~ ip (a i} 
ar 
COs - 
2m 


for all integral values of a except a = (2y — 1)m,y = 1 --» . However 


x46 
cos (2m — 1) - 
: 2m 
lim — 


>(2y—1)m IW 
COS =z 
2m 


We shall evaluate Sy» for r S 3m — 1 


™ES' / 2m + 2r — 2 2m + 2r — 2 
ds —) a - ” 
 * a b +r-—-l-— a) 1) 2m ( r+ 1 ) 


1f/2m + 2r— 2 9.2, (2m + 2r — 2 
=3 — —_ zm * 
2\m+r—]1 r— 1 


Therefore 


~ 


9 9 9 ) 
l—m— 2m+2r—3 4 2m + at 4\\ 
4 “<9 — 2m ( 


r— 1] 
ne aman (2m —_ *) 
and similarly 


») 


») ») 
2 i om4-omt”) @ a 7 
, 


Therefore 


: ei 2m + 2r — 2)! 
S = m4 (m+) (Om? — m — r)+ — 
ri(2m +r)! 


Substituting in (3.1) we get 


er or scala 2m 2r — 2)! 
(3.3) 655° = o' (m — 1)7°2" (2m? — m — 1)- (am + 2r — 2)! 
(2m + r)! 
forr S 3m — 1. 
Similarly for m odd the same type derivation is carried out with the same 


result (3.3). 





MODIFIED MEAN SQUARE 823 


4. Variance Ratio Analogue. Let us now consider two independent random 


sample of sizes 2m, and 2m, whose values are z,(i = 1, 2,--- , 2m) and 
ad ‘ : ae . ‘ «2 ° ° 
yi(j = 1, 2, +--+ , 2me). These provide estimates 59; and 592 of the variances of 


the population. We wish to consider whether the samples may be regarded as 
drawn from the same normal population of variance o. If we consider the ratio 
2 2 ° 7 
Y = 501/502 , we have an analogue of Fisher’s F. 
4° e2 + ; 2 ey ° ° 
Since 69; is independent of do the probability is given by 


(4.1) h(y) = i. 502 Pm, (W5oe) Pmg(So2) d(5o2). 
Using this quotient convolution formula we get the density of y to be 
h(y) = 4™"7*™2*(m, — 1)(m. — 1)miz' ms" - - (—1)™ cos™*™ = 
a tel 2m, 
(4.2) - cos"? a, sin” ce sin” | ym — 1) see’ pe 


+ (m; — 1) sec” er | 
5. An analogue of the Student ¢. We will now give the distribution of 
(5.1) E = 2m(% — u) / do, 


> ‘ 1 2 - 
where &z; = w and (2m) Dik i= z. 
Since 49 is invariant under a translation, it is independent of the numerator 


and we may again apply the quotient convolution formula. Hence the density 
of E is 


'< her kr 

n—2 —1 k+1 2m—6 K - 2K 
v(t) = 4"-“(m — 1)m >» (—1)"" cos” — sin” — 
kal 2m m 


—3/2 
. le + (m — 1) sec’ =| ; 
2m 
where —~ <§< ~, 

It can be shown that & — N(O, 1) by considering & = 2m(% — yz)’ / 6. If we 
divide the numerator and the denominator of & by o’, the resultant denomina- 
tor converges in probability to 1 as m increases and the numerator is a chi-square 
variable with 1 degree of freedom for all m. Hence & converges to a x; variable, 
and since £ is symmetric, it tends to a N(0, 1) variable. 


6. Tables. The application of these statistics to control charts has been dis- 
cussed by Kamat [3]. We shall give a table of the upper and lower .025 points 
of 8) and the two tailed .05 points of & which is symmetric. In the table, n = 2m 
is the number of observations. The values for n S 20 have been computed di- 
rectly from the cumulative distribution functions. For n > 20 we find the values 
by an approximate procedure. We let 


(6.1) P, = P, + am” + am” + 


1. 6 —1 —2 
(6.2) Py, ~ P+ am” + am” + aym 








24 SEYMOUR GEISSER 


Since we have exact values for P,, for n S 20, we choose three of these values 
and get three simultaneous equations in a; , a , and a; . We then use the values 
for a; , a2, and a; in (6.2) to extend Table I. 


TABLE I 


5/0? 84/0 3 


n upper .025 lower .025 
4 3.689 -026 
6 .071 -106 
8 2.694 172 
10 2.458 .225 
12 2.294 . 269 
14 2.172 306 
16 2.078 .338 
18 2.001 .366 
20 1.938 .391 
22* 1.866 405 
24* 1.810 -423 
26* 1.761 .442 
28* 1.718 463 
30* 1.677 .483 
32* 1.645 .502 
34* 1.612 .520 
1 . 
1 
1 
1 
1 
1 
1 
1 


SNN eo 


bt 


wo Ww Ww WN 


bo 


~ 


36* .583 .538 
38* -558 .554 
40* 534 -570 
42* 511 -585 
-491 .599 
472 .612 
-454 -624 
437 637 


wo wo Ww Ww Ww bt 


44* 
46* 
48* 
50* 


wow Nw Ww Ww WN 





* For these n’s the value calculated in the table is approximate. 


7. Acknowledgment. I am extremely grateful to Professor Harold Hotelling 
for guidance and help in this research. 


REFERENCES 

[1] J. Dursin anv G. S. Watson, ‘Exact tests of serial correlation using non-circular 
statistics,’’ Ann. Math. Stat., Vol. 22 (1951), pp. 446-451. 

[2] W. Horerrpine anp H. Rossins, “The central limit theorem for dependent random 
variables,”? Duke Math. J., Vol. 15 (1948), pp. 773-780. 

[3] A. R. Kamat, ‘Modified mean square successive difference with an exact distribution,”’ 
Sankhya, Vol. 15, Pt. 3 (1955), pp. 295-302. 

[4] I. J. Scuwatt, An Introduction to the Operations with Series, University of Pennsylvania 
Press, Philadelphia (1924). 

[5] J. von Neumann, R. H. Kent, H. R. Be.irnson, anv B. I. Hart, ‘‘The mean square 
successive difference,’? Ann. Math. Stat., Vol. 12 (1941), pp. 153-162. 





mle Tem 


NOTES 


ON THE TUKEY TEST FOR THE EQUALITY OF MEANS AND THE 
HARTLEY TEST FOR THE EQUALITY OF VARIANCES':? 


By K. V. RAMACHANDRAN® 


University of North Carolina 


1. Summary. The unbiasedness of the Tukey Studentized range test for the 
equality of means of k univariate normal populations with a common variance 
and of the Hartley Fmsx ratio test for the equality of variances of k univariate 
normal populations is proved. 


2. Introduction. The purpose of this paper is to establish the unbiasedness of 
two tests which are derived by the union-intersection principle [2], the tests 
being within the Neyman-Pearson set-up of two-decision problems. 


3. The Tukey q-test. Let 2;;(¢ = 1, 2, --- ,k;7 = 1, 2,---,) be the ele- 
ments of k independent samples of size n from normal populations with means 
u,; and variance o* (i = 1, 2, --- , k). Also let s° be an independent and unbiased 
estimate of o° based on m d.f. (say, the error mean square in anova). It is well 
known that 7; = Din az;;/n is normal with mean yu; and variance o*/n. 

To test the hypothesis Ho : uw: = we = --- = ue we proceed as follows: First 
we notice that Hp» is equivalent to the totality of all Hi;: u; = bj 
(i # j,i,j = 1,2,---, k). Also for any two y’s, the hypothesis un; = »; can be 
tested using Student’s “‘t” with m d.f. The hypothesis u; = yu; is accepted if 
\%; -— %;|s t,s(2/n)'” where t, is the upper 7/2 point of Student’s “?t’”’ with 
m df. Now since Hy is equivalent to the totality of the hypothesis 
Hi (i + j,i,7 = 1, 2,---, k), we get a test of Hy as follows: Take the inter- 


k ; 
section of all the (*) two-by-two Student’s ‘“‘t;;’’ acceptance regions, and 


\ sn 


accept Ho if 


largest | ti; | = sup |Z: — Z| /s / 
i xj, t,j—1,2,-++,k 


It is easy to check that this is the same as accepting Hp if 


Sibol 


Zmax —~ Lmin 
qd - r : = s Q, 
2 
S on 
n 





Received January 10, 1955; revised April 19, 1956. 

! This work was supported in part by the Office of Naval Research. 

2 Presented under a different title to the meetings of the Institute of Mathematical 
Statistics at Montreal, Canada, September, 1954. 

* Present address: Department of Statistics, University of Baroda, India. 


825 





826 K. V. RAMACHANDRAN 


where Q is the upper a point of the Studentized range g with m d.f. (Notice that 
t, = Q.) This is the Tukey g-test [3]. 

Starting with the definition of the g-test, we have, for the probability of the 
second kind of error, 


| Emax — Zmin 
B= Pr; / 
| 8 A/ 
fon 


Pr « Ymax — Ymin < Qy 9 
where y; Vn &;/o (i 1,2,---,k) ands’ = s/c. Now y; is normal with 
/ . . ; 2 1/2 . a 
mean yu; and variance unity, where pn; = (n/o) "ui (¢ = 1, 2,--- , k). Also, s’ 
. . . 2 2\1/2 - ’ 
has the distribution of (x,,/m")"” independent of y’s. 
Now since the test is invariant under location transformations, we have 


k-1 pa 


2 EL nwo [ove [ 


nitnjt+Q’ 


ni+Q’s k—1 “2 
p(t) dt IT | p(t) dt dz ds 


2—Hitn 
9% , 


« 72 k—1 p2tnst+Q's 
+ | pr(s) | p(z) I] p(t) dt dz ds, 
x j=l + 


“0 2+0j 


(3.2) 
where 


(z) =, 

a V 29 
m—l ms? /2 
prls) consts € 
Mia = Mi — Hi Gj = 2,3,---,k). 

From (3.2) it is evident that 8 involves as parameters only the k — 1 n’s. Hence 
the power (=1 — 8) of the g-test involves as parameters only the k — 1 7’s. 
It is worth noting at this point that the right side of (3.2) is symmetric in the 
n’s. Hence the power of the g-test is also symmetric in the n’s. 


4. Unbiased nature of the g-test. To prove the unbiased nature of the q-test 
we need to use certain lemmas, which we shall now prove. 

Lemma 1.‘ Suppose that 

(1) in the domain D: (x: a; S 2; S bj, 7 = 1, 2,---,k), f(a, t2,-°-+ , Le) 
exists, all partial derivatives of order one and two exist, all partial derivatives 
of order one vanish simultaneously at one and only one inner point 


QP (x10 , 20 , eee » Xo) of D: 


(2) the matrix of second partials evaluated at P is negative definite (n.d.); 
and 


(3) at every point (x , t,---, 2) on the boundary of D, f(a; ,22,--+ , t%)< 
A, where A = f(x,0, T9,--* , Lio). 


‘ The author wishes to thank the referee for suggesting the present proof of Lemma 1. 





TUKEY AND HARTLEY TESTS 827 
Then 
(4.1) Iai 5 Ze, see , te) < A 


for alla ce D, x + P. 
Proor. Because the domain is closed and the function is continuous, max 
f(a. , %2,°+> , t%) = B, say, exists. Suppose B 2 A and that for 


- 


ull * . * * . 
(X%; ,2o, - te) ~ (410,220, °** , Leo), f(ai ,%e , +++ te) = B. 


By Condition 3, (xf , x2 , --- , z£) is not on the boundary. By Condition 1, at 


least one partial derivative is not zero at (z} , 72, --: , 72), say the derivative 
with respect to z, . Suppose it is positive; i.e., that 


Of (ay, Xe 9 °° * 9 Xe) 
f(r Xe ; > 0. 
Ox, z=3* 


Then for sufficiently small 6, flat + 6,22, °++, ar) >f(xi,as,-*+, a) = B, 
which is contrary to the assumption. Hence the lemma. 

Lemma 2. If the conditions of Lemma | are satisfied as a; + — © orb; > ~, 
for any 7 and for fixed values of a; , b;(j ¥ 1,7 = 1,2,---,k), thenf(m,--- ,a)< 
A for all z ¢ D’: {x: —-0~ <a < o,t=1,:-- kjJH,x = P. 

Proor. The proof follows obviously from Lemma 1. 

THEoreM 1. The Studentized range test of Tukey is unbiased. 

Proor. Differentiating 8 with respect to m we get, after some simplification, 


=| ms) | P@pet+nt Q's 
: k—1  pzt+nj+Q's 
— plz + m)plze + Q's) } II / p(t) dt dz ds 
9 jm? 2+) 
(4.2) = > 
+¥][ pls) | tre + wpe + nt Qs) 
J z+0j+Q'e 2+Q's 
-petmpetantas}) Il | p(t) dt {pW dt de de. 
a4 Y2+0j z 


j=2 


It is easy to check that the right side of (4.2) will be negative if » > 0 and 
m > n(i = 2,3, ---,k& — 1) and positive if 7, < 0 and 


m1 < ni(t = 2, 3, ae 7 ss 1). 
By the symmetry in the variables the same is true of 08/dn(i = 2,3, --- ,k — 1); 


1.€., 


op : 
ae < 0 if Ni a 0 and ni = Nmax » 
€ On; 
(4.3) 
ap 


—>0 if "i < 0 and Ni = Nmin- 





828 K. V. RAMACHANDRAN 


Also it is evident that 
(the notation 7 0 will mean m n2 
0 
(4.4) - 0. 
Om Jn—0 
Similarly, 
9 = ; 
% = 0 (¢ = 2,3,---,k — 1). 
Oni _|n—0 
Now suppose 7 =~ 0. Then either ‘nmax > 0 OF nmin < 0. Hence the first partials 
can vanish simultaneously only at (0,0, --- , 0). 
Again it is easily verified that 
a°B 


(4.5) —; 
Oni 


| = —(k — QQ), 


where 


= i) 3 ae+Q’s i k—2 
c(Q’) = | spi(s) | exp — E + 34(2+ a | | = at | ds dz > 0 


Hence 
a 
v6 | <0 
Oni Jr=0 


Also 
a°B 


(4.6) —:-- 
Oni On; 


| = Q'c(Q’) > 0 ( + j,t,j = 
q=0 


Hence the matrix of second partials, when 7 = 0, is 


M = i. | | 
| Oni On;_jN—0 || 
—(k — )fQ’) fQ) 
f(Q) —(k — I)f(Q’) 
—(k — 1)f(Q’) || 


where f(Q’) = Q’c(Q’) is negative definite. 

To complete the theorem it will now suffice if we show that 8 — 0 on each 
point of the boundary of the domain D: {n: ¢ S ni S A537 = 1,2, ---,k — 1} 
as, say,  —> —© or \; — © for fixed values of ¢e;, A,«(i = 2,3, ---, k — 1). 
Now it is easy to verify that as «, +> — ©, the value of 8 at each point on the 
boundary — 0. Similarly, it is easy to verify that as \, — ©, the value of 8 at 
each point on the boundary — 0. Also the value of 8 at the point where n’s = 
0 is 1 — a > O. Hence all the conditions given in Lemma 2 are satisfied by 
the function (7). 





TUKEY AND HARTLEY TESTS 
Hence 


(4.8) B(n) < B(O) for every n ~ 0. 
Hence the Tukey q-test is unbiased. 


5. The Hartley F,,.x ratio test. Let 2:;(¢ = 1,2, --- ,k;7 = 1,2,---,n+1) 
be the elements of k independent samples of size (n + 1) from normal popula- 
tions with means yu; and variances oi(i = 1, 2, --- , k). It is well known that 
8i Dict Ga - — ,)’/n, where #; = >-**/ 2,;/(n + 1) is an unbiased estimate 
of o;(¢ = 1,2, --- , k). It is also well known that nsj/o; is a chi-square variable 
with n d.f. 


rr . 2 2 2: . . 
The hy pothesis Ho: o; = of = +++ = o% is equivalent to the totality of hy- 
S- | aes P y ’ 
potheses Hi; i o% + oj(t * j;1,7 = 1, 2, --- , k). Now for any two o’s, the hy- 
a o; = o; can be tested using the nape ratio F of Fisher with df. 
2,2 nal >/ 
S (si/s;) S Fy, where F, 
Now since Hp is seiebeilinad 
2, 


(n, n). The aoonine sis oj = oj is accepted if 1/ 
is - upper y/2 point of Fisher’s F with d.f. ry 


to the totality of the hypotheses H{;(i ~ j; i,j = , k), we get a test of 


n). 
:. 
, a ; : k ape 
Ho as follows: Take the intersection of all the (5) Fisher’s Fj; = (s;j/s;) ac- 


ceptance regions and accept Hp if 
largest Fi; = sup (si/s;) = Fy. 
i 7,4,7—1,2,--+,k 

It is easy to check that this is the same as accepting Ho if Fmax = (Smax/8min) S F, 
where F is the upper a point of the Fax distribution with df. (n, n). 
(Notice that F, = F.) This is the Hartley Fmax ratio test [1]. 

Starting with the definition of the Fnax test, we have, since scale transforma- 
tions leave the test invariant, the probability of the second kind of error 


r Fu/n; Punj/n 
p(u) / p(v) dv a / p(w) dw du 


J ulng j=l 4 uny/ni 
iwi 


ePun; 


2 k—1 
+ | p(u) II p(v) dv du, 


0 j=l J un; 


where 


(n/2)—i_ —wu/2 


1 


p(u) = const u and oe 
From (5.1) it is evident that 8 involves as parameters only the k — 1 n’s. Hence 
the power (=1 — 8) of the test involves as parameters only the k — 1 ns. It 
is worth noting at this point that the right side of (5.1) is symmetric in the 7’s. 
Hence the power of the test is also symmetric in the 7’s. 


6. Unbiased nature of the F,,,. test. To prove the unbiased nature of the 
Fngx test we need to use a lemma which is 
Lemma 3. If the conditions of Lemma 1 are satisfied as a; — 0 or b; — © for 





830 K. V. RAMACHANDRAN 


any i and for fixed values of a; , b;(j # i;7 = 1,2,---,&),f(m,- 
for ak a2 SD’: iz: 8 < my < @,6 = 1, --- 8} = * P. 

Proor. The proof follows obviously from Lemma 1. 

THEOREM 2. The Fax test of Hartley is unbiased. 

Proor. Differentiating 8 with respect to m we get, after some simplification. 


~FPun 


2 {n\ 0B ' n—1 (+F 1 —u(F+n) TT , 

2 nt =~ 0ti4Iex) anh. tite 

I (5 — = | "eC me “TT | p(v) dv du 
“a om “0 j=? Jun; 


— ulngt+Fn) 


»Fun 


j Pu 
p(v) dv | p(w) dw du, 
jt “nj u 


- TI 
where 


2)-—1_ —v 


p(v) = constv”’”’ e 


It is easy to check that the right side of (6.1) will be negative if m > | 
and m > mi = 2, 3,---, k — 1) and positive if m < 1 and m < 
ni(i = 2,3,--- ,k — 1). By the symmetry in the variables, the same is true of 
0B/dn(t = 2,3,---,k — 1);i.e., 


IB . 
; <0 if 7 >41 and 
Oni 


0g 
On; 
Also it is evident that 


(6.3) 28 
Om Jjn=1 


Similarly, 
} 
26 0 eHte kad 
Oni _jnm1 


Now suppose » ~ 1. Then either ‘max > 1 OF nmin < 1. Hence the first par- 
tials can vanish simultaneously only at (1, 1, --- , 1). Again it is easily veri- 
fied that 


- 0 if Ni < l and Ni = min - 


0B 


(6.4) ty 
Oni 


| = (k — Dad — F)c(F), 
1 


where 





TUKEY AND HARTLEY TESTS 


Hence 


3? 
(6.5) | <0 
On; = 


Also 

7°B 
a (F — 1)ce(F) > 0 
_ On; | 1 Le pe 


Hence the matrix of second partials, when 7 


|| —(k — 1)9(F) gQ(F) --- g(F) 
oe | 7 —(k — Ig(F) ++ g(F) 
On: On; _Jn—t | 


where g(F) (F — 1)c(F) is negative definite. 

To complete the theorem it will now suffice if we show that 8 — 0 on each 
point of the boundary of the domain D:{n:e; S m; S X37 le--,k—1} 
as, say, «: — 0 or \; — © for fixed values of «; , Ax(t = 2,3,---,k — 1). It is 
easy to verify that as «, +0 the value of 8 at each point on the boundary —0. 
Similarly, it is easy to verify that as \; —> ©, the value of 8 at each point on 
the boundary — 0. Also the value of 8 at the point where n’s = lisl — a > 0. 
Hence all the conditions given in Lemma 3 are satisfied by the function 8(n). 
Hence 


(6.8) B(n) < B(O) for every » + 1. 


Hence the Hartley Finsx test is unbiased. 


7. Conclusion. So far we considered the F,,,, test when all the s;’s are based 
on the same number of d.f. n. Investigation is proceeding on the behaviour of 
the F..,x test when the d.f. are unequal. Power properties of similar generaliza- 
tions of the q-test are also being investigated. 

By inverting the test procedures considered in Sections 3 and 5 useful simul- 
taneous confidence bounds on all two by two differences of the means and all 
two by two ratios of the variances can be obtained. 


8. Acknowledgement. The author wishes to express his indebtedness to Pro- 
fessor 8S. N. Roy for suggesting this problem and for his help and guidance in 
the preparation of this paper 


REFERENCES 
[1] H. O. Harruey, ‘The maximum F-ratio as a short cut test for heterogeneity of vari 
ance,’’ Biometrika, Vol. 37 (1950), pp. 308-312. 
[2] S. N. Roy, ‘On a heuristic method of test construction and its use in multivariate 
analysis,’’ Ann. Math. Stat., Vol. 24 (1953), pp. 220-238. 
[3] J. W. Tuxey, ‘‘Allowances for various types of error rates’? (unpublished invited ad- 


dress, Blacksburg meeting of the institute of Mathematical Statistics, March, 
1952). 





MURRAY ROSENBLATT 


REMARKS ON SOME NONPARAMETRIC ESTIMATES OF 
A DENSITY FUNCTION’ 


By Murray RosenBLATT’ 
University of Chicago 
1. Summary. This note discusses some aspects of the estimation of the density 
function of a univariate probability distribution. All estimates of the density 


function satisfying relatively mild conditions are shown to be biased. The asymp- 
totic mean square error of a particular class of estimates is evaluated. 


2. Estimates of the density function. Let X,,--- , X, be independent and 
identically distributed random variables with continuous density function f(y). 
Let S(y; X:,--- , X,) be an estimate of f(y). The function S(y; 2, --- , 2) is 
assumed to be jointly Borel measurable in (y, 7, --- , Zn). It is also assumed 
that 

S(y;%1,°°*,2%a) = 0, 
since f(y) = 0. 
It can easily be shown that 


S(y; XxX; oT ae Xn) 


is not an unbiased estimate of f(y). Suppose to the contrary that 


(1) ES(y; X1, ++: , Xn) = f(y) 


for all continuous f and all y. Condition (1) implies that for each y, 
ES(y; X, _—_, » An) < O. 


Assume that S(y; 21, --- , 2n) is a symmetric function of zx, , --- , 2, , since the 
symmetrized n-tuple (X,,---, X,) is a sufficient statistic for the problem. 
But then 


eb 


(2) | S(y; Xi, +++, Xx) dy 


“a 


is a symmetric estimate of 


b 
F(b) — F(a) = | fly) dy. 
Moreover, (2) is an unbiased estimate of F(b) — F(a), since 
ab =b 


E S(y; X1,°°-,X,) dy = | ES(y; X1,---, Xn) dy 


“a 
t 


= [| s@) dy = FO) - F@) 


Received April 27, 1955. 

1 Research carried out at the Statistical Research Center, University of Chicago, under 
the sponsorship of the Statistics Branch, Office of Naval Research. 

The comments of R. R. Bahadur have been very helpful 

2 Now at Indiana University. 





ali 


NONPARAMETRIC ESTIMATES 833 


by Fubini’s theorem. However, the only unbiased estimate of F(b) — F(a) sym- 


metric in the observations X,,---, X, is F,(b) — F,(a), where F,(y) is the 
sample distribution function. This follows immediately from the fact that the 
symmetrized n-tuple (X,,--- , X,) is a complete statistic [2]. Thus, 


b 
F.(b) — Fala) = [ Sty; X1, +++, Xx) dy 


for all a and b and almost all X,, --- , X,. But then F,(y) is absolutely con- 
tinuous in y for almost all X,, --- , Xn, which is impossible. 
One need not require S(y; 21, --+ , n) to be nonnegative. An assumption like 


eb 


E Sly; X1, a +, Xn)! dy < @ 


for some two values, a, b, with a < b, would lead to the same conclusion, that is, 
that there are no unbiased estimates S(y; X:,--- , Xn) of f(y) satisfying this 
condition. 


3. The difference quotient of the sample distribution function. An obvious 
estimate of f(y) is the difference quotient 
2h 


S(y; Xi oe hal = faly) =e 


of the sample distribution function F,(y), where h = h, is a function of the 
sample size n and approaches zero as n — ©. The asymptotic behavior of this 
estimate as n — © is examined in terms of its mean square error. Fix and Hodges 
have used an estimate of this form in their discussion of a nonparametric dis- 
crimination problem [1]. 

Now, 


EF,(y) = F(y), 


ELF .(y)Fa(y’)] = + F(min (y, y') + "—* FW), 
so that 

cov (F,(y), F.({y’)) = - [F(min (y, y’/)) — F(y)F(y’)). 
But then, 


cov (f(y), faly’)) = ak (F(min (y + hy’ + h)) — Fy t+ WP’ + bh) 


— F(min (y + h,y’ —h)) + Fly + A)F(y’ — h) — F(min (y — hy’ + A)) 
+ Fly — h)F(y’ + h) + F(min (y — h,y’ — h)) — Fly — Fy’ — Ad). 


, 
’ 


On setting y = y 


(fay) = ge (Py +) — Py — W) + Py + — FY - WY 
27°71 





834 MURRAY ROSENBLATT 


Now consider the behavior of f,(y) where y is fixed asn — ~ and h — 0. The 
mean square error 
E\f.(y) — fy)’ = oUaly)) + (Efa(y) — f(y)” 


l 


the [F(y + h) — Fly — h) + (Fly + h) — Flu — A))’) 
vn 


+. 3 (F(y + h) — Fly — h)) - so) | 
ah 


is a reasonable measure of how good an estimate f,(y) is of f(y) locally at y. 
The density function f is assumed to be sufficiently regular for the following 
evaluation of the mean square error to be carried through. It will be enough to 
assume that the first three derivatives of f exist at y. Then 


ruth 


F(y + h) — Fly —h) = | f(u) du 
Jyh 


eyth 
(f(y) + (u — y)f’ty) 
Jyh 
+ 4(u — y)*f"(y) + Olu — y|*) du 
= 2hf(y) + 4f"(y)h® + Oh‘). 


Assume that f’ (y) ¥ 0. Then 
17 2 h ” . 
(Ef.(y) — fy) ~\ oF 
) j 


as h — 0. The variance of the estimate 


fy) 
2hn 


o (fn (y)) es 


as h — 0. The asymptotic mean square error 


4 
fly) be = if”(y)|? + 0 (= 4 i‘) 
) 


(3) E\ fay) — SW! ~ oR, + 5 m 


ash — 0 and n— ~. The question of an optimal choice of h = h, as a function 
of n now arises. If h is set equal to kn “, a > OQ, it is easily seen from (3) that 
the optimal choice of a is a = 4. The optimal value of k is then the one mini- 
mizing 

fy) 


2k 


Pa 3 fiy)_ |" 
= LiRPwe! 


tals’ y. 


This value of k is 





NONPARAMETRIC ESTIMATES 


With this choice of k and a, we find that 
Eifaly) — fy)? ~ 492 *F)"* If" | n". 


The choice of k would be based on guesses as to the magnitude of f(y), f” (y). 

One is led to a choice of h as a function of n that is independent of y by 
considering a global measure of how good f, is as an estimate of f. The inte- 
grated mean square error 


[Blt — Sa? ay 


is a simple measure of this type. Let f(y), f”(y) be bounded continuous functions 
that are square integrable. One is then led to the following asymptotic expression 


) h' “ 


er 2 ] \2 ' 1 4 
E\f.(y) — fy) dy ~ = "(y)\ d (4 ) 
r Ialy) — SO du~ oF + 3g J IPO dy + ol th 
as h — 0 and n — o. The optimal choice of h as a function of n is 
h = kn 


where k is nov 


9 


fe eed Seagestenicennen 


af |r ay 


~ 2 > * 1/5 
| E\faly) — SP ay ~ rege | i f’(y)/? ay | nv 


asn-— @®., 


4. A class of estimates of the density function. The discussion of the previous 
section suggests that the following class of estimates will be of interest. Let 
w,(u) be a nonnegative function such that 


[ w,(u) du = 


The sequence of functions {w,(u)} is chosen so that the total mass concentrates 
in the neighborhood of zero as n — ©; that is, given any e > 0, 


| w,(u) du — 1 
lul<e 


as n — o, Corresponding to each sequence of weight functions {w,(u)} of this 
type, there is an estimate 


oo 


fly) = | way — u) dF,(u) = = > wily — X;). 


j=l 





836 MURRAY ROSENBLATT 
Now 


Bjaly) = | waly — 2) dP (e) = | waly — 2)fla) de, 


and 


cov (faly), faly’)) | [- wily — x)waly’ — x)f(x) dz 


a ( i i nde ~ oti az) |. 


On setting y = y’, we have 


o(faly)) = | [ , wily — 2)f(2) dx — ( | ; waly — x)flz) az) |. 


Note that all estimates of this form are themselves density functions; that is’ 


f(y) 2 0, 


and 


An estimate f,(y) with any desired regularity properties can be obtained by 
choosing a weight function w,(u) with these same regularity properties. Thus, 
fa(y) will be analytic if w,(1) is. 

As an example, consider 


i\ 
wr(u) = ; w (*) . 


where h = h,— 0 asn— , and 
| w(u) du = 1. 


The estimate discussed in the previous section is obtained on setting 


1 , lasl 
niin {} when |u| < 1, 


0 otherwise. 


The function w(u) is assumed to be square integrable. Then 


o(faly)) = Afi [- w'eagty + hu) du — ( [- woosty + hn) au) |. 


and 


Ef.(y) — fly) = [ womgy + hu) du — fly). 








~ 
os 
> 


NONPARAMETRIC ESTIMATES 837 


It is clear that 


wlu)f(y + hu) du— fly), 


and that 


[ w(u)f(y + hu) du — f(y) / w(u) du 


as h — 0 for every continuous density function f. Hence, 


o(frly)) ~ - fly) | w (u) du. 


The integral 


[ ww lul® du 


is assumed to be finite, and f(y) is assumed to have continuous derivatives of 


the first three orders in the following computation of the bias. The bias of the 
estimate is then 


Ef, (y) — fly) = | w(u)lf(y + hu) — fly)] du 


= hf’(y) | w(u)u du + 4h’f”’(y) / w(u)u’ du + O(jh|*). 


It is now clear that it would be advantageous to have 
[ wu du = 0. 


This condition will be satisfied if w(w) is symmetric about zero. Using the same 
sort of argument as was used in the last section, it is easily seen that the mean 
square error of these estimates can be made no smaller than O(n~*"*) for all ad- 
missible f. It would be very interesting to find out whether there are other esti- 
mates f,(y) with an asymptotic behavior of the order 1/n asn > ~. 


REFERENCES 


[1] E. Fix ano J. L. Hopaegs, Jr., Discriminatory Analysis, Nonparametric Discrimination: 
Consistency Properties, USAF School of Aviation Medicine, Project No. 21-49- 
004, Report No. 4. 

[2] E. Lenmann, “Notes on the theory of estimation,’’ University of California, Berke- 
ley (1950). 





D. A. S. FRASER 


SUFFICIENT STATISTICS WITH NUISANCE PARAMETERS 
By D. A. S. FRASER 
Princeton University’ and the University of Toronto 
1. Summary. For some problems involving a parameter of interest and a 
nuisance parameter, it is possible to define a statistic sufficient for the parameter 
of interest. The definition has a number of applications in nonparametric theory. 
Two theorems are derived and used by way of illustration to prove that the sign 


test is a uniformly most powerful test for the nonparametric form of the single 
sample problem of location. 


2. The definition. In problems of estimation and hypothesis testing, it often 
happens that one parameter in particular is of interest, whereas other parameters 
present are nuisance parameters. For some of these problems a generalized 
definition of sufficiency can be applied. Let X be a random variable over the 
measurable space X (9%) and let {Pen | (0, ») ¢ 8 X H} be the class of possible 
probability measures for X. Also, let ¢(x) be a statistic mapping X(%) into the 
measurable space 3(%) and let P;, designate the measure on 3(%) induced by 
t(x) from the measure Ps, over (2%). Then we propose the following extension 
of the concept of sufficiency: ¢(x) is a sufficient statistic (0) for the class of measures 
{Po | (0,) ¢ @ X H} tf there exists a function P, (A | t) such that 


(1) P, (AN t(B)) = [ P(A | t)dP? (1) 
B 


for all A eA, B eB where the induced measure of t(x), P¢, is independent of ». 

The conditional probability that X falls in the set A given t(X) t is given 
by a function which will serve as the integrand in the integral of (1). The defini- 
tion says that this conditional probability must depend only on the nuisance 
parameter 7, and that the marginal distribution of the statistic (xz) should depend 
only on the parameter of interest @. Thus it can be seen intuitively that the 
statistic ¢(7) is in a general sense sufficient for problems concerning the pa- 
rameter 6. 

For the particular case in which there are no nuisance parameters, this defini- 
tion reduces to the ordinary definition of sufficient statistic. However, there need 
not exist a sufficient statistic (@), whereas there always exists a sufficient statistic 
by the usual definition. Another drawback to the formulation above is the re- 
quirement that the parameter space be a Cartesian product. Other cases can 
sometimes be treated by a transformation of parameters. 

3. The theorems. For the probability model defined above, consider the fol- 
lowing hypothesis testing problem involving in effect only the parameter 6: 
Hypothesis: 6 € w, ncedH; 

Alternative: 6¢8 — wo, neH. 

Received May 31, 1955. 

! This work was supported in part by the Office of Naval Research 





NUISANCE PARAMETERS 839 


If there is a statistic sufficient for 6, then the following theorem proves that in a 
certain sense we need only consider test functions which can be expressed as 
functions of t(z). 

THEOREM 1. If $(x) is a size a test function for the problem (2), and if t(x) is 
sufficient (0), then there is a size a test function y(t(x)) for the problem, its power 
function depends only on 0, and for each 6 it has power at least as large as 


(3) inf P,(6, »), 


eH 
the minimum power of o(x) for that @. The power P,(6, 7) for the test (x) is 


defined by 


(4) P.(0,2) = |, 6@)dPa(2). 


Proor. Take any 7, say m , and define 
v(t) = E,,{o(X) | ¢(X) = t}, 


the conditional expectation of (x), given that t(x) = t;¥(t) does not depend on 
6 or 7. From the relation (1) and the fact that a conditional expectation can be 
defined in terms of the conditional probability, it follows that the expectation is 
independent of 6. (t) is determined, except on a set having Ps measure zero, and 
satisfies almost everywhere (P; ) the same bounds 0 S y(t) S 1 satisfied by ¢(¢). 
We then choose y(t) to satisfy these bounds everywhere; that is, we make y(t) 
a test function. The power function of ¥(t(x)) is given by 


Py(6, n) = Eofy(t(X))} 
Es {¥(T)}, 


and is seen to depend only on 6. Now using (4) we obtain 


P,(0) = Es ¥(T)} 
= Ep {Ey {o(X) | (X) = T}} 
= Eon {o(X)} 
= P4(, no), 
and then it easily follows that 


(5) inf P,(6,) < P,(@) s sup P,(@, »). 
j qeH 


ncH 
With 6 taking value through w, (5) proves that y(t(z)) is a size a test. For 0 ¢ 8 — 
w, (5) proves (3). 
A closely related theorem is the following: 
THroreM 2. If t(x) is sufficient (0) for the class of measures 


{Po, | (0,0) € 8 X H}, 





840 D. A. S. FRASER 


then there is a uniformly most powerful test for the hypothesis testing problem 


Hypothesis: 6 = 0, n€ H, 
(6) 
Alternative: 6= 1, neH; 
it can be chosen to have power independent of 7. 
Proor. Consider the related problem having a simple alternative; 
” Hypothesis: 6 = 0, ne dH, 
(4) ' 
Alternative: 6= 6, n=. 


To find a most powerful size a test for this problem, we look for a least favorable 

probability distribution over the hypothesis; a natural choice is to assign all 

probability to the hypothesis parameter value having 7 = m . We then consider 
Hypothesis: 6 = &, n=, 

(8) 


Alternative: 6= 6, n=. 


Let ¢(x) be any size @ test for this problem and let y(t) be the most powerful 
size a test over 3 for the hypothesis Ps, against the alternative Ps,; we show 
that ¥(t(x)) is of size a for (8) and has power greater than or equal to the power 
of ¢(x); that is, we show that ¥(/(x)) is the most powerful test for (8). Since 


Eon (¥(U(X))} = En {¥(T)} S a, 
it follows that ¥(t(x)) is of size a for (8). Defining y*(t) by 
v*() = By, {o(X) | UX) = 4}, 
we have 
Pyazy (i,m) = Eos, W(t(X))} 
= Ej, W(T)} 
Ex (W*(T)} 
Eom, (d(X)} 
= Poay(*i, m)s 


which proves that ¥(¢(z)) is at least as powerful as ¢(z). 
Now, since 
Fo, W(U(X))} = EX {y(T)} 


a, 


it follows that y(t(x)) is a size a test for (7). Hence, it is the most powerful size 
a test for (7). Also, since the choice of y(t) did not depend on m , it follows that 
y(t(x)) is most powerful for each m; and hence is the uniformly most powerful 
test for (6). From the properties of the statistic ¢(z), it is obvious that the power 
of ¥(t(x)) is independent of ». This completes the proof. 





NUISANCE PARAMETERS $41 


Also for estimation theory we have the following simple extension to a theorem 
of Lehmann and Scheffé [1]. 

THEOREM 3. If t(x) is a sufficient (6) statistic for {Pe | (0,1) ¢€O X H}, af the 
class of measures {P¢ | @ € @} is complete, and if g(@) is a real estimable parameter, 
then there is an essentially unique unbiased estimator with minimum variance and 
minimum risk (strictly convex loss); this estimator is the only unbiased estimator 
which is a function of t(x). For vector parameters read ellipsoid of concentration 
for variance. 

Proor: The proof is essentially that found in [1]; we sketch the one point of 
difference. Let f(x) be an unbiased estimator for 6 and define 


f(t) = E,{f(X) | (Xx) = r}. 


It is then easily seen that f,(¢(z)) is an unbiased estimator for g(@) and, from com- 
pleteness, that f,(f) is essentially independent of 7. 


4. An example. In [2] the sign test was shown to be a uniformly most power- 
ful test for the nonparametric formulation of the problem of location. As an 
example we show that this result derives from our Theorem 2. Let X,,--- , Xn 
be independent and let each X; have the same distribution function F(x), where 
{F,|@eQ} is the class of continuous distribution functions on the real line. 
Let &s(F) designate the 8-th fractile of the distribution F, that is, 


F(&) = B; 


if this is not unique, let és be any one of the possible values. Then we can describe 

the one-sided nonparametric location problem by 

Hypothesis: &s(Fe) = 0, 6€Q, 

(9) 
Alternative: §3(Fs) > 0, 6€Q. 


For this problem the parameter space is 2 or equivalently is the space of con- 
tinuous distribution functions on the real line. We define three parameters 
pe , Fy (x), Fg (x) by the relations 


Pe = F,(0), 
a a F,(x) : p. 
Fo) = F0) ifz=0 
= | if z > 0, 
Fo(xz) = 0 ifz <0 
_ Fiz) — FO). 
== 1 — FAO) if 2 > 0. 


If pe = 0, 1, then Fy (x) and Fj (zx) are respectively indeterminate; for the sake 
of definiteness they can be the distribution functions for the uniform distribution 
on [—1. 0] and on [0, 1]. These three parameters (one real, two functional) give 





842 W. A. THOMPSON, JR. 


respectively the probability to the left of the origin, the relative distribution on 
the negative axis, and the relative distribution on the positive axis. Also, for 
any possible values for the parameters p, F (x), F*(x) (F-, F* continuous), 
there corresponds a continuous distribution F4(z). 

The hypothesis testing problem (9) can be written equivalently 

Hypothesis: po = B, 6 €Q, 
(10) 

Alternative: pe < B, Ge 


’ 


and it is obvious that this is a problem for which a sufficient statistic (ps) is 
appropriate. Let i(z,,---, 2.) be the number of positive z;. Obviously the 
distribution of 7(X,, --- , X,) depends only on pz» , and the conditional distribu- 
tion of (X,,---, X,) given i(X,,---, X,) = r depends only on F%, F%. 
Hence, i(z; , --- , Xn) is sufficient (pe). For the binomial problem of testing po = 
8 against ps < 8, the sign test to reject for large values of i(z , --- , 2,) is uni- 
formly most powerful. Then by Theorem 2 it is the uniformly most powerful 
test for the nonparametric location problem. 


REFERENCES 


{1] E. L. LEHMANN ANp H. Scuerr®&, ‘‘Completeness, similar regions and unbiased estima 
tion,’’ Sankhyd, Vol. 10 (1950), pp. 305-340. 

[2] D. A. S. Fraser, ‘‘Nonparametric theory. Scale and location parameters,’’ Canadian 
J. Math., Vol. 6 (1953), pp. 46-48. 


ane 


A NOTE ON THE BALANCED INCOMPLETE BLOCK DESIGNS 


By W. A. THompson, Jr. 
Virginia Polytechnic Institute 
0. Summary. It is a well-known property of the BIB design that all treatment 
effects are estimated with the same accuracy, i.e., that the variances of the esti- 
mates of the treatment effects are all equal and their covariances are also all 
equal. We show that the converse is also true. If the estimates of the treatment 


effects in an incomplete block design all have the same variances and the same 
covariances, then the design is a BIB. 


1. A matrix result. If 
b 


a 


. 23.5 i 


is av X v matrix, then C has characteristic roots a + (v — 1)b anda — b, the 
latter of multiplicity » — 1. We need a second result which is a partial converse: 


Received April 12, 1955 











BALANCED INCOMPLETE BLOCK DESIGNS $43 


Lemma. If C is av X v matrix such that the sum of all elements in any given 
row is zero and if (e ¥ 0) is a characteristic root of C with multiplicity (v — 1), then 


v— 1 —] tee —] 
caer =. tee 98 
v\ : : 
—] —] v— |] 
PROOF. 

mir 
Mi2 
nm = . 
Mir 


is called a characteristic vector of C corresponding to the characteristic root e; if 








[ma 
(C —eD)| "| =0. 
Mi» 
From 
ly 
4 
( ”". | =0, 
yt) 
we see that 
vy? 
y 
m = : 
yt 
is a characteristic vector corresponding to 0. 
We may choose v such characteristic vectors m, --- m, in such a way that 
M = (m, --- m,) is an orthogonal matrix. We will have 
0 | 0 
0 2 0 0 
A ‘Ch = ° => “ 
sacs | c el 
10 ey 
Thus, 
, 
my, 
0 0 0 0\/m 
ety (mm my (9 2) 
M (° °) M m, Me m,) 0 ef 
m, 


= € 7. m;m; = eM,M:, 


tan? 





844 W. A. THOMPSON, JR. 


where M = (mM.). 
Now, 
/ 


m 
I = MM' = (mM2) = mm, + M2Ms>. 
1) 


3 
t 


( 
E 
: 
\ 


| 
M.M, =I —- mm =I- : 
1 1| 
Bt 


hence, 


—1 
—1 
v—1 
2. Application to Designs. We deal with an arbitrary incomplete block design 
with v treatments, b blocks, k plots in a block, and r repetitions of each treat- 


ment. 7, will be the effect of the ith treatment, and §; , the effect of the jth block. 


If treatment 7 appears in block j, then we assume the observation y;; has the 
form 


(2.1) Yi = Ti + B; + €ij- 


#;, the estimate of 7;, is computed by equating the adjusted treatment totals 
(Q;) to their expectations and solving for the r’s: 


Cuti + Crete + +--+ + Cte = Qi 


where, 
1 j, 


and \,; is the number of blocks in which the treatments 7 and 7 occur together. 
Also, 


9 


cov (Q;, Q;) = cis’, 


. 
var Q; Cio , 





BALANCED INCOMPLETE BLOCK DESIGNS 


or the covariance matrix of Q;, ---: , Q» is 


(2.3) Co’. 


It is well known that for a connected design, the rows and columns of C sum 
to 0 and its rank is v — 1. Hence, 0 is a characteristic root of multiplicity 1. 
We now transform from Q,, --- ,Q,.toza,-+-+,2: 


(2.4) Q = Tz, 
where 7 is an orthogonal matrix such that 
é O 
T'CT = : = 
0 0 
The covariance matrix of 21, ---* , 2 is 
2.5) Do’. 
Also, on 7,,--+- , 7, the solutions of (2.2), we make the substitution # = 7T’n. 
These two substitutions and (2.2) then imply 
CTn = Tz, 
or 


(2.6) Dan = 2. 


If the covariance matrix of #,,--- , # is Ao’, then the covariance matrix of 
m,°:*, % is M’AMo’ = Dx,o’. This is diagonal, since the z’s are uncorre- 
lated and en; = 2; (e; ~ 0) for all but one of the n’s; we may also verify that 
this remaining n is zero with probability one. 

From (2.5) and (2.6) we have 


D. = D.DaaD, , 
1 = e,(A)e; for e; ~ 0, and finally 
e(A) = 1/e; for e; ~ 0. 


We are now ready to assemble the parts and exhibit the promised result. 

TuHroreoM. If the estimates of the treatment effects in an incomplete block design 
all have the same variances and the same covariances, then the design is BIB. 

Proor. Let the characteristic roots of C be eg = 0, e1, @,°** , @&—a- By hy- 
pothesis, A is of the form 





846 OSCAR KEMPTHORNE 


and hence has two different characteristic roots; they are 
e(A) = l/e;, i= “+s vp—l], and @(A); 
= 1/e(A) for e; ¥ 0. 


Therefore, C has roots eg = 0 and e = 1/e(A), the latter of multiplicity 
Using the lemma of Section 1, 


—] 
J 


9— ] 
which says that \, and hence our design is BIB. 


SE ee 


THE EFFICIENCY FACTOR OF AN INCOMPLETE BLOCK DESIGN! 
By Oscar KEMPTHORNE 


Statistical Laboratory, Ames, Iowa 


1. Summary. It is shown that the efficiency factor of a design is r times the 
harmonic mean of the latent roots of the reduced intrablock normal equations 
exclusing the root which is always zero. 


2. Properties of reduced normal equations. We consider an incomplete block 
design with the following properties: 

(1) There are rt units arranged in b blocks of k units; 

(2) Every one of the ¢ treatments occurs r times; 

(3) A treatment occurs once or not at all in a block. (This condition is easily 
seen to be unnecessary for all the results given below.) 

Then, it is well known (see for example Kempthorne [1], pp. 541-543) that 
the reduced normal equations for intrablock estimates of the treatment effects 
7; are of the form 


(1 - : a)(@) = (QV), 


where A,; equals r for all i, A;;(= Aj;;) is the number of blocks which contain 
both treatments 7 and 7, and Q; is the total for treatment 7 adjusted for blocks. 
Also, the interblock estimates are given by the equation 


(A)(#) = (R), 


where /?; is equal to the total of blocks containing treatment 7 minus r/b times 
the grand total, and the condition }>7; = 0 is imposed for the nonestimable 


quantity 7 T 


Received July 5, 1955 


' Journal paper No. J-2819 of the Iowa Agricultural Experiment Station, Ames, Iowa, 
Project 890 





EFFICIENCY FACTOR 847 


We first note that the matrix A is real and symmetric, so there exists an or- 
thogonal matrix O, 


O00’ = I= 020, 
such that 
OAO’ = D, 


where D is a diagonal matrix, with, say, D;; = d; , where the d,’s are the latent 
roots of A. It is also obvious that 


0 | _ tA O' = rl - : D 


is a diagonal matrix for which the (j7)th element is r — d;/k. 
Now, let Or = p. Then we have 


(r1 ; p) 6 = OQ, Dp = OR. 


We note immediately what was perhaps entirely obvious without the above 
elementary manipulations: that if #; exists (i.e., if there is an intrablock estimate 
of p;), there may or may not be an interblock estimate of p; ; but if 6; does not 
exist (i.e., if no intrablock estimate of p; is given by the design), then r — d;/k 
is equal to zero, and hence d; is not zero and an interblock estimate of p; exists. 
Similarly, if j; does not exist, then f; exists. Of course, in the case of many de- 
signs, 6; and j; both exist for some j. 

It is also known that if W and W’ denote respectively 1/o° and k/o; , where 
o° is the variance within blocks and o} is the variance between blocks of block 
totals, then the combined estimates of 7; , say 7} , are given by 


| w (1 . ta) +> a] =WO+*_R, 


so that p* is given by the equation 


| w (1 . tb) + D |e = WOQ + * or, 


[w(r- 4) + Valor = [woo + © or], 


| wr a - (Ww — w| ps = | wog + = or | 5 


This equation is, among other things, a mathematical presentation of the usual 

rules for analyzing quasifactorial designs in terms of effects and interactions of 
° * e ° ° 

pseudo-factors. In that case, the estimates p; can be written down by inspection 





848 OSCAR KEMPTHORNE 


of the design, and one merely has to use r* = O’p* to get the combined estimates 
of the 7,’s. 


2 


3. The efficiency factor of a design. The efficiency factor (EF) of an incom 
plete block design is defined to be 





EF mean variance of treatment differences in complete block design 
ea | enn EET net 
mean variance of intrablock estimates of treatment differences in 
incomplete block design 
within-block variance in incomplete block design 


within-block variance in complete block design 

Alternatively, the efficiency factor may be defined as the first factor of the above 
right-hand side, assuming the within-block variances are the same in the two 
designs. The first factor on the right-hand side above gives the relative effi- 
ciency of the design. The efficiency factor does not depend on the actual within- 
block variances and is purely a property of the design. 

We now proceed to get the relationship of the efficiency factor of the design 
to the d; , which are the latent roots of the matrix A. We have the identity, 


1 2 2 - 5 

See, ee ee ee ee 

t(t — 1) m=, | *;) (¢ — 1) ia - 
ij 


Hence, 


when the condition a tj 


lr, — 7; = Ol. 


One of the roots of A is zero regardless of the design, within the restrictions 
given above, and this corresponds to the fact that the reduced normal equations 
for treatments can be summed to give the equation 


0( >> #;) = 0, 


so that > rt; is not estimable. We shall suppose that the root of (rJ — (1/k)A), 
which is zero regardless of the design, is the root r — d, / k. It will then fellow 
that (OQ); is identically zero. We take the characteristic vector corresponding 
to this root to have all its elements equal. The imposition of the condition 
> #; = 0 will then cause the imposition of the condition #,; = 0. Hence, we have 


E(d # / all r; = 0) = ED 65 / all 7; = O), 





NULL DISTRIBUTION 


and p; is always zero. Also, 


V(é;) = oi / G — : d;), 


where a; is the within-block variance for the incomplete block design, so 


» 9 2 
7 +E} & (4% — #9)" /all ry; = 0| = = 
(t — 1) 


J 
‘vj 


which is the mean variance of a treatment difference. 

For the complete block design, the mean variance of a treatment difference 
is 2o03/r, where o; is the variance within blocks for the complete block design. 

Hence, we have the final result. The efficiency factor (EF) of an incomplete 
block design is equal to r times the harmonic mean of the latent roots of the 
matrix of coefficients of the reduced normal equations for the intrablock esti- 
mates, excluding the always-present zero root, whose characteristic vector con- 
sists of the same number repeated ¢ times. 

It may be of interest to record the view point that while the efficiency factor 
is a reasonable criterion of the loss due to confounding by blocking, from some 
points of view the generalized variance would be better. This, of course, corre- 
sponds in a certain sense to the geometric mean of the latent roots. 


4. Notes on the Result. The result is interesting to the author and appears 
to be worth recording in the literature. It was obtained in a search for a proof 
of a theorem that the design with the highest efficiency factor is a balanced in- 
complete block design if such a design exists. To the author’s knowledge, this 
theorem is yet to be proved. 


REFERENCE 
[1] Oscar KempTuorne, The Design and Analysis of Experiments, John Wiley and Sons, 
New York, 1952. 


rr 


THE NULL DISTRIBUTION OF THE DIFFERENCE BETWEEN THE 
TWO LARGEST SAMPLE VALUES’ 


By J. St-Prerre AND A. ZINGER 


University of Montreal, Canada 


1. Introduction. A decision procedure to select the population with the largest 
mean, proposed by Bose and St-Pierre [1], involves the auxiliary statistic u = 
Lo) — Xa), Where z@ and xq) are respectively the largest and second largest 


Received July 20, 1955 
1 Work done under the sponsorship of the National Research Council of Canada 





850 J. ST-PIERRE AND A. ZINGER 


TABLE | 
®,,(u) 


mn=4 = 5 n= 6 


00 000 .00 000 .00 000 .00 000 .00 000 00 000 
16 321 .19 439 .21 630 23 298 .24 623 25 720 
31 250 .36 450 .39 967 42 565 .44 592 » 239 
44 573 .50 943 55 O78 .58 040 .60 299 52 100 
56 178 62 964 183 .70 110 .72 289 3 993 
66 040 .72 669 607 .79 249 .81 168 2 639 
.74 217 .80 295 .83 737 .85 968 .87 547 8 73 
.80 831 -86 126 .88 975 .90 760 .91 985 SSS 
.86 051 90 464 d 718 .94 O78 .94 990 648 
.90 067 .93 604 ; 315 .96 307 .96 956 416 
.93 081 .95 815 .97 065 .97 761 .98 204 98 515 
.95 287 .97 329 ‘ 208 .98 676 98 966 99 168 
.96 861 .98 335 926 .99 223 .99 398 515 


.97 957 .98 990 .99 383 .99 577 .99 688 99 770 


to tw tw bt 


values in a sample of size n + 1 taken from a normal population with zero mean 
and unit variance. The null distribution of u might be obtained from a formula 
given by Irwin [2], but computations, based on it, seem rather complicated. 
Another form of the distribution was obtained by St-Pierre [3], involving iterated 
integrals of the normal density. 


2. The null distribution of u. Let us denote by @,(u), the p.df. of 
u = Ze — Xa) in the case of a sample of size n + 1. An expression for ¢,(1), 
more amenable to calculations than the ones previously mentioned, can be 
derived using, as a starting point, the joint distribution of the ordered sample 
values [4]. It has the following form: 


,(u) — — 


(2 PVZ Juve Sve 

3. Tabulation of ,(u), the c.d.f. of u. Rapidly converging series expansions 
of ¢,(u) were derived. With the help of [5], [6], and [7], ¢,.(u) was computed for 
u: 0.0(0.2)2.6. Finally ®,(u) was obtained by numerical integration methods [8]. 
Table I gives ®,(u) for n = 2,3, --- , 7. 


, 


(n + 1)!e"**"" [" ow [" 


ott dy, --- dys. 
“(yy iV n—1) /V/ nF 


4. Remark. As was pointed out by the referee, #,(u) can be obtained directly. 
It can be shown that 


P{zo@) — Za) + r] = @,,(d) 


(2 peo ov? /2 yt+r ew? /2 7 \ 
=1-—(n+1)4d(- {*) =|) [Hat ay). 
n+ 42, (—1) j ane |. ae | dy> 


The terms involving integrals of the type 


pe yy y?/2 utr 5 t2/2 j 
Coll Feeley 





MOMENT PROBLEM 851 


can be evaluated by interpolation using equation (20) and Table I of [9]. How- 
ever, the number of decimals in Table I of [9] is not sufficiently large to yield 
accurate enough calculated values of ®,(1). 


REFERENCES 


(1] R. C. Boss, anv J. Sr-Prerre, ‘‘On a decision procedure to select the population with 
the largest mean’’, Abstract, Ann. Math. Stat., Vol. 25 (1954), pp. 813-814. 

[2] J. O. Irwin, ‘The further theory of Francis Galton’s individual difference problem,”’ 
Biometrika, Vol. 17 (1925), pp. 100-128. 

{3| J. Sr-Prerre, “Distribution of linear contrasts of order statistics,’’ Institute of Statistics, 
University of North Carolina, Chapel Hill, July, 1954. 

[4] 8.8. Witxs, Mathematical Statistics, Princeton University Press, Princeton, N. J., 1943. 

[5] Tables of Circular and Hyperbolic Sines and Cosines for Radian Arguments, National 
Bureau of Standards, 1939. 

(6) Tables of the Exponential Function e¢, Applied Mathematics Series, XIV, National 
Bureau of Standards, 1951. 

|7| Tables of Normal Probability Functions, Applied Mathematics Series, XXIII, National 
Bureau of Standards, 1953. 

[8] C. Jornpan, Caleulus of Finite Differences, Chelsea Publishing Company, New York, 
1947. 

19} R. E. Becnuorer, ‘‘A single-sample multiple decision procedure for ranking means of 
normal populations with known variances,’’ Ann. Math. Stat., Vol. 25 (1945), 
pp. 16-40 


oo 


A CERTAIN CLASS OF SOLUTIONS TO A MOMENT PROBLEM 


By Lionet WEIssS 
University of Oregon 


1. Summary. A uniqueness and a characterization theorem are given for the 
density function over the interval [—1, 1] with a given finite sequence of mo- 
ments whose square has the smallest possible integral. Extensions are indicated. 


2. Existence and characterization theorems. Let uo = 1, we,---:, un be a 
given set of real numbers (0 S n < «). Necessary and sufficient conditions on 
(uo, -** , wu») that there be at least one density function f(z) over [—1, 1] with 


1 
/ x'f(x) dx = pi, ¢=(0,---, 
1 


1 
[' 1) a < « 
1 


have been given [1]. Throughout this paper, we shall assume that the sequence 
(uo, °** , #,) Satisfies these conditions. Then we have: 
TuroreM 1. Let {f} denote the class of density functions over [—1, 1] satisfying 
(1.1), and let M denote 
1 
g.l.b. f'(x) dx. 
f(r) in{f} -1 


Received May 18, 1955 





852 LIONEL WEISS 


There is a function g(x) in {f} with fii g*(x) dx = M. Any function in {f} with 
this property equals g(x) almost everywhere. 

Proor. We can find a sequence f,(zx), fo(x), --- of functions in {f} with f fi 
approaching M as i increases. Let ¢; denote f fj — M. Then f [f; — f,J’ = 
2M + «6+ ¢ — 2ffi-f; = 0,so ffi-f; S M + 3(e: + «,)). Also, 4(f; + f;) is 
in {f}, so that f(3(f; +f, = 40M + «) + 2M + 6) + Bf fifi; = M, or 
Sfef; = M — 3(e + «€,). Thus, fff; approaches M as 1/i + 1/j approaches 
zero, and therefore f{[f; — f,)’ approaches zero with 1/i + 1/j. But then it is 
known ((2], p. 243) that there is a measurable function g(x) such that f[f; — g]’ 
approaches zero as 7 increases. But g(x) is in {f}, for it must be non-negative 
almost everywhere on (—1, 1), and 


1 
/ x’g(x) dx — py, = | x'[g(x) — fi(x)] dx 
-j] 1 


([ a az) Cf lo — fi ar) Z 


the term on the right approaching zero as i increases. Also, f°, g°(x) dx M; 
for f g = Sfi +fg- fy + 2f f(g — fi), and as 7 increases the expression on 
the right of this last equality approaches M. If a function f(x) in {f} has 
fii f(x) dx = M, then f(x) = g(x) almost everywhere. For }(f + g) is in {f}; 
therefore f [3(f +g)’ = 4M + 4S fo = M, or f fo = M. But if f fails to equal g 
on a set of positive measure, f fg < (f f’)'’(f g’)'” = M, a contradiction. There- 
fore g(x) is essentially unique. 

TuroreM 2. The function g(x) described in Theorem 1 is, almost everywhere on 
(—1, 1), equal to a certain polynomial P(x) of degree at most n wherever P(x) is 
non-negative, and is equal to zero elsewhere. 

Proor. Denote the polynomial of degree 7 in the sequence of polynomials 
orthonormal on a given bounded set S of positive measure by Q(z, i, S), so 
Ss Q(x, i, S)Q(x, 7, S)dx = 6,; (the Kronecker delta). Since the sequence 
{Q(x, 7, S)} is complete in the class of functions whose squares are Lebesgue 
integrable over S, a necessary and sufficient condition that a function r(x) in 
this class is, almost everywhere on S, equal to a polynomial of degree at most n, 
is that fs r(z)Q(z, i, S) dx = 0 for all i > n. Also, of all functions s(x) whose 
squares are Lebesgue integrable and which have fs s(xz)a' dx = c¢;,i=0,---,n, 
by Parseval’s Theorem ({2], p. 251) one with the smallest fs s(x) dz is the poly- 
nomial v(x) of degree at most nm uniquely determined by fsv(x)z'dx = ¢;, 
i = 0,-:-, n. For any positive e, let G, denote the subset of (—1, 1) where 
g(x) = «. Assume e is small enough so that the measure of G, (written m(G,)) 
is positive. Then, almost everywhere on G, , g(x) must be equal to a certain 
polynomial of degree at most n, say P(x) (P(x) will not depend on e). For if 
not, there is ani > n so that fo, g(x)Q(z, i, G.) dx # 0. Then we can find a 
positive 6 so that g(x) + yQ(z, i, G,) is positive on G, for all y with ly! < 6. 
But a yo with 0 < |}yo| < 6 can be found so that 


[ (g(x) + voQ(z, i, GP’ dz = / g + 270 | 


“@G “G, “@G “G, 


gQ +7 < [ gq’. 








i 


MOMENT PROBLEM 853 


But then if we define h(x) as equal to g(x) + yo-Q(a, 7, G.) on G, , and equal to 
g(x) elsewhere, h(x) is in {f}, and f,h’ < f',g’, a contradiction. Therefore, 
almost everywhere on G, , g(x) = P(x). Now we take a sequence of decreasing 
positive numbers converging to zero, say 4 , €, °°: . Let R,, be the subset of 
G,., where g(x) fails to equal P(x). Then R,, , R.,, --- is a nondecreasing se- 
quence of sets, and m(R,,) = 0 for all i. Now R,, + R., + --- is the set where 
g(x) ~ 0 and g(x) + P(x), and m(R., + RR. + --:) = limnw m(R,,) = 0. 
Therefore, almost everywhere where g(x) does not equal zero, g(x) equals P(x). 
Now let P, be the subset of (—1, 1) where P(x) 2 e. Almost all points of G, 
are in P,. Suppose 0 < m(G,.) < m/(P,). Then, denoting the complement of G, 
by G, , we can adjoin to G, a subset of G,-P, of positive measure, to get a set G. . 
The polynomial q(x) of degree at most n defined by fe; q(x)x‘ dx = fe: g(x)x‘ dz, 
i = 0, --- , , must be negative somewhere on G, , for if not we could decrease 
fiig’ by replacing it by g(x) on G (g(x) cannot equal g(x) almost everywhere 
on G. , for g(z) = P(x) on G,, zero on G. — G,). But the polynomial defined 
on G, as q(x) is defined on G is at least ¢ everywhere on G,, and by making 
m(G.) close enough to m(G,), we can make certain that q(x) is non-negative on 
G. , for the coefficients of g(x) vary continuously as m(G.) grows. This contra- 
diction proves that m(G.) = m(P,). Taking a sequence «, &,--- as above, 
the set G where g(x) is positive is G.,, + G., + ---, the set P where P(z) is 
positive is P,, + P.. + --- . Then m(G) = lim m(G,,) = lim m(P,,) = m(P), 
so m(G) = m(P). Since almost every point of G is in P, we have that almost 
everywhere where P(x) is positive g(x) = P(x). 


3. Extensions. The results above can be generalized as follows. 

THEOREM 3. Given a bounded set S of positive measure, and measurable functions 
ho(x), hi(ax), -+- , hn(x) such that fs hi(x) dx is finite for i = 0, --- , n, and num- 
bers mp, ™m,,-** , M, , suppose that there is at least one measurable function f(x) 
with the following properties: 

(a) f(x) = 0 almost everywhere on S, 

(b) fs f(x)hi(x) dx = m;,i = 0,--- 

(c) Ss f(x) dx is finite. 

Then there is a measurable function g(x) with these properties, uniquely defined 
almost everywhere on S, such that fs g°(x) dx achieves the g. 1. b. of fs f°(x) dx taken 
over the class of functions with properties (a), (b), and (c). Further, g(x) is equal 
to a certain linear combination L(x) of ho(x), --+ , hn(x) wherever L(x) is non- 
negative, and g(x) is equal to zero wherever L(x) is negative. 

Proor. Exactly the same as in Section 2, except that the orthonormal sequence 


of functions starts with linear combinations of ho(x), --- , hn(x) instead of linear 
combinations of 2°, --- , 2”. 
REFERENCES 
{1} N. I. AcnrEseR AND M. Krein, On Certain Problems in the Theory of Moments, Khar- 
kov, 1938. 
[2] L. M. Graves, The Theory of Functions of Real Variables, McGraw-Hill, New York, 
1946. 





854 K. C. SEAL 


ON MINIMUM VARIANCE AMONG CERTAIN LINEAR FUNCTIONS OF 
ORDER STATISTICS 


By K. C. SEAL 


Calcutta University 


1. Summary. Suppose there are n normal populations N(u; , 1), +7 = 1, --- ,n 
and that one random observation from each of these n populations is given. 
Let 2; S x. S --- S x, be the observations when arranged in order of mag- 
nitude and let the corresponding n random variables be denoted by X;, t = 
bee's ae 

The following theorem is proved: 

THLOREM. 


Var (> Cj X,) , where 


t= 1 
(1) 


is minimum when c; 1/n, 1 
The above theorem may be applied to provide a direct proof of the result that 


Diet X, is the best unbiased linear function of order statistics for estimating 
the sum >-?) yu; . 


2. Proof. Let (c;;) be the variance-covariance matrix of X; and X;,7 = 


Ri 
-,n;j = 1,--- ,n. The above theorem wiil follow from the following lemma. 
Lema 1. 


n 


(2) 2 5 = 1, j=1,-:: 
t=1 
Proor. The joint probability density function (pdf) of X,,--- , X, can be 
easily shown (see [2], pp. 12-17) to be given by 


(3) (Qn)? > exp, — 4D (ai — me,)*> dé, 


i=l 
In, 


where 7 = (4, --- , é,) isa permutation of (1, 2,--- ,), =, denotes the sum- 
mation over n! such permutations and £ represents the row vector (x, --- , Zn). 

Let g be any differentiable function such that the integrals involvell exist 
and we have identically in u, 


\ 


Eg(X; + u) = [ te | g(a; + u)(2x)”” > exp, — 4 Do (xi — me,)”) dé 
J cat 


? T 


| vee | g(x;)(2e)-”” >) exp< —3 2 (a; — u — pe,) ? dé. 


ry eee In 


Received August 25, 1955. 





MINIMUM VARIANCE 855 


Differentiating both sides of (4) with respect to u and setting u = 0, we obtain 


Eq’ (X;) 
| --- | g(v;)(n)"” r|> (x; — w.,) exp{ —$ Do (ai — wd dé 
m.. r r Lint \ i=l 


\ 


6) = fe f od D es — wx E exps —4 D (es — me)" ae 
~ tel r \ i=1 
z= - 22, 


E | 92) >, (a: = w) |. 
i=l 


With g(x) = 2, equation (5) gives the required lemma 


l=E£ x, Pr  ... H | = > Sy. 


PROOF OF THE THEOREM. 


n 


(6) Var (> Ci X;) = is > CiC; Os; . 


i=l i=1 j=1 


Hence, to minimize (6) subject to the condition (1), we get the following equa- 


tions to be satisfied by c;’s,i = 1, ---,n, 
(7) > cio; A, ga] e>- am, 
i=l 
where 2 ) is used as Lagrangian undetermined multiplier. 
From (2) and (7) it follows, on summing over th@ n equations, that 4 = 1/n, 
so that the desired values of c;’s,7 = 1, ---,n, should satisfy 
(8) > cio = 1/n, j=l1,---,n. 
t=] 


Comparing the equations (2)with (8) and noting that the matrix (¢;;) is non- 
singular, it follows that the solution of equation (8) ise; = 1/n,i = 1, --- , n. 
This proves the theorem. 
In the above theorem, when 


~~ hl es 
Lemma | was derived by Lloyd [1]. Also we get in this special case the known 
result that Var ( >? ,c,;U;), where die cc = lLandw S mS --- S uy are 
n ordered values from N(y, 1), is minimum when ¢; = 1/n,7 = 1, --- , n. 


ACKNOWLEDGMENT. My thanks are due to Prof. Wassily Hoeffding for indi- 
cating the above proof of Lemma 1. 


REFERENCES 
[1] E. H. Lioyp, ‘‘Least squares estimation of location and scale parameters using order 
statistics,’’ Biometrika, Vol. 39 (1952), pp. 88-95. 
[2] K. C. Seat, “On a class of decision procedures for ranking means,’’ Unpublished Ph.D 
Thesis (1954), University of North Carolina, Chapel Hill. 





5S. N. ROY 


A NOTE ON “SOME FURTHER RESULTS IN SIMULTANEOUS 
CONFIDENCE INTERVAL ESTIMATION” 


By 8S. N. Roy 


Institute of Statistics, University of North Carolina 


0. Summary. This note gives an explicit proof of a lemma in matrix theory 
repeatedly used in [2]; the lemma follows easily from other results, but an ex- 
plicit proof of it may not be trivial. This note also gives closer confidence bounds 
for one of the several problems discussed in [2] 


1. A matrix lemma. A lemma repeatedly used in [2] is the following: 


(1.1) Cmin(AB™)emin(BC) < all c(AC) S Cmax(AB™)emax(BC), 


where A, C, B (and hence B™) are symmetric positive definite matrices of order, 
say p, each. This follows easily from 


(1.2) Cmin(M]))¢min(Me) = all c(M,M_) s Cmax(M1)€max(M2), 


where M, and M, are symmetric positive definite matrices. 

(a) If M(p X p) is a symmetric positive definite matrix, then there exists 
a nonsingular triangular matrix 7 such that M = TT". 

(b) Any nonzero characteristic root of A(p X gq) X B(q X p) is a character- 
istic root of B(¢q X p) X A(p X q), and vice versa. 

(c) If M(p X p) is symmetric positive definite and Q(p X p) is any nonsingu- 
lar matrix, then QMQ’ is symmetric positive definite. 

(1.2) is proved in [1], (a), (b), and (c) are well-known matrix theorems, and 
(b) and (c) are also proved in [3]. 

Turning now to the proof of (1.1) and using (1.2) and (a), (b), and (c), we 
put B = TT’ and note that 


(1.3) Cmax(AB™ Yemax(BC) = Cmax(AT’*T™ ce. ax(TT'C) 
= Cmax(1 AT’ emas(1"CT) = Cmax(T ACT), 


i.€., 2Cmax(AC). The other side of the inequality in (1.1) follows in a similar 
fashion, and this completes the proof of (1.1). 


2. Closer bounds on the c(2,22')’s than those given in [2]. If S, and S, stand 
for the dispersion matrices of random samples of sizes mn, and nz from N(é; , Zi) 
(with 7 = 1,2), the constants ¢:4(p,m — 1,2 — 1) = 4. and 


Coa(P, — 1, n2 - 1) = Cra ’ 
say, are defined in [2] such that 


(2.1) Pleia & all c(S,Sz") S Coa | 21 = D2] = 1 — a. 


It is well known, [2] and [3], that if 2; + 2. and ify: S y2--- S vy, stand for 
c(2,22') and D, for a diagonal matrix whose diagonal elements are y1, --- , Yp; 


Received July 19, 1955 





CONFIDENCE INTERVAL ESTIMATION 857 


then the c(Diy.> S:Dyy>Sz')’s have the same joint distribution as that of the 


c(S,Sz")’s under the null hypothesis: 5; = 2. Thus, we have 
(2.2) Pleta = all (Die SiDyy>Sz') = Cie | 21 * =| = 1 — a. 


The statement under the probability symbol is equivalent to 


(2.3) 1 S allc(S:D,<SiDyz) = 2 


Cia Cra 


T : ¥ Y r—1 , y—1 . 
Now, noting that S,, S; (and hence Sz), and D,~S; D,> are symmetric 
positive matrices and using (1.1), we have 


(2.4) Cmax(S1S2')Cmax(S2D y=Si'Dys) S Cmax(SiDy~Si'Dyz). 


Now, putting S; = 00’ and remembering [1] that if A(p X p) is a matrix with 
real roots, then 


(2.5) Cmin(AA’) S$ (A) S Cmax(AA’), 
we have 
(2.6) Cmx(OO'Dy<O'"O"Dys) = Cmex((O’D,~0'")(0'D,=0'"))] 
= Cmax(O’Dyz0""), 


1.€., = Cmax(D ys); =p . 
Combining (2.4) and (2.6), we have 


(2.7) Cmax(S2D 587 Dy) > ¥p/Cmax(SiS2"), 

and, in a similar fashion, we also have 

(2.8) Cmin(S2D y5Si' Dy5) S ¥1/¢min(S:Sz'). 

Thus, it is easy to check that (2.3) implies 
(S; Sz") , 


Y ol 

; SiSe ) 

(2.9 Cmax\ > all c hy > Cmin(é a~s / 
Cia Cra 





which is therefore a confidence statement with a probability greater than or 
equal to 1 — a. 


Now, as to the closeness of these bounds compared to those of [2], we note 
the following: Using (1.2), we have 


(2.10) Cmax(S1S2") S tmax(S1) Cmax(S2'), 
1.€., SCmax(S1)/Cmin(S2), and 

Cmin(SiS2') = Cmin(S1)/Cmax(Se). 
Thus, (2.9) implies 


1 Cmax GS) > allc ve 1) “ l Cmin\ S;) 
—_ ° ae 2122 & - 
Cla Cmin (So ) Cra Gut 


(2.11) 








858 SEYMOUR GEISSER 


which is therefore a confidence statement with a confidence coefficient greater 
than or equal to the confidence coefficient of (2.9). Thus, if (2.3) has a probability 
1 — a, (2.9) has a probability 1 — 8 2 1 — a, and if (2.9) has a probability 
1 — 8, then (2.11) has a probability 1 — y = 1 — 8. The bounds in (2.11) are 
the ones obtained in [2] in a different way. 


REFERENCES 
{1} S. N. Roy, ‘A useful theorem in matrix theory es Proc. Amer. Math Soc., Vol. 5 (1954), 
pp. 635-638. 
|2] S. N. Roy, “On some further results in simultaneous confidence interval estimation,’ 
Ann. Math. Stat., Vol. 25 (1954), pp. 752-761. 
[3] S. N. Roy, ‘‘A report on some aspects of multivariate analysis,”’ Institute of Statistics, 
University of North Carolina, Mimeograph Series No. 121. 


a ec 


A NOTE ON THE NORMAL DISTRIBUTION 
By SryrMour GEISSER! 
National Bureau of Standards 


1. It is well known that a necessary and sufficient condition for the inde- 
pendence of the sample mean and variance is that the parent population be 
normal. This was first shown by R. C. Geary [2], and later Lukaecs [3] gave a 
somewhat simpler proof using characteristics functions. 

By using the method of Lukaes one can derive a similar theorem concerning 
the sample mean and the mean square successive difference. 

2. Let 2 ,---, 2, be independent and identically distributed with density 
f(z) and mean yu and variance o’. 

Let 


n 

= . wage . 

t=n Tj, 
j=l 


Se Tin — kb) > Gus — 2) 
=1 


The foilowing theorem can be proved: 

THEOREM: A necessary and sufficient condition that f(x) be the normal density 
is that 6, and & be independent. 

Proor: If 4 
replacing 


and # are independent, then we follow Lukaes [3] step for step, 


s n *I(n — 1) 2 — 22>. > rar841] 


Received on June 14, 1955. 
Now at the National Institute of Mental Health 





ie 


NORMAL DISTRIBUTION 859 


o> 
tr 
r— 
M 
iM 
to 
M 
ie 


so that 


ol(t,,f | . | e' ore f(xy) cee f(xn) dx, -++- dx, gilt gots), 
or 
Ay(th , te) = a(t.) Beet) 
Oto tol — Oty ( 
It. is easy to show that 
¢i(t;) lW(t,/n)}", 
where 
y(t) = | e'* f(x) dx, 
and 
A(t; , te —) ar ithe tintin ; 
= - z = 04 [W(t n)] | ae f(a) dx — [W(t n)| | re fe) de | , 
Ago ty) 


id 


This leads to the same differential equation 


i dy | = — 2 
v(t) de + (= = o |y(t)| 


obtained by Lukacs, and the solution of which is the characteristic function of 
the norma! distribution. 

The converse is a special case of a lemma by Daly [1], which says that # 
and g(x, , X») are independent in the normal case if g(x, ---, 2) 
g(a1 + a,---, 2% + a). Since & is invariant under a translation, the theorem 
is proved. 


REFERENCES 


(1) J. F. Daxy, ‘‘On the use of the sample range in an analogue of Student’s ¢-test,’’ Ann. 
Math. Stat., Vol. 17 (1946), pp. 71-74 

[2] R. C. Geary, “Distribution of Student’s ratio for nonnormal samples,’”’ J. Roy. Stat. 
Soc. Supp., Vol. 3 (1936), No. 2 

[3] E. Luxacs, “A characterization of the normal distribution,’’ Ann. Math. Stat., Vol. 13 
1942), pp. 91-93 








860 ABSTRACTS 


CORRECTION TO “AN APPLICATION OF INFORMATION THEORY TO 
MULTIVARIATE ANALYSIS, IT” 


By 8. KuLuBack 
The George Washington University 


In the paper cited in the title (Ann. Math. Stat., Vol. 27 (1956): 

p. 122, line 7, delete ‘maximizing information’ and replace by ‘discriminating 
between a null hypothesis and the alternative hypothesis by using that distri- 
bution corresponding to the alternative hypothesis which for the sample values 
provides the least information for discrimination’; 

p. 123, line 3, between ‘information’ and ‘in’ insert ‘for linear discriminant 
functions’ ; 

p. 124, line 6, same as p. 122, line 7 above; 

p. 124, line 11, change ‘maximum’ to ‘minimum’; 

p. 124, line 13, change ‘maximizing’ to ‘minimizing’; 

dM,(t) 

p. 124, (3.8), add ‘a = [ ye"gly) dy) _ dt _,, 


J Mb M(t) ’ 
p. 125, line 3, delete ‘maximum’ and replace by ‘minimum /*’; 
p. 125, immediately following (3.12) add ‘It is readily verified that g*(y) is 
normal when g2(y) is normal.’; 
p. 130, line 23, insert ‘(’ between ‘=’ and ‘X,)’. 


Ce 


ABSTRACTS OF PAPERS 


(Abstracts of papers submitted for the Seattle meeting of the Institute, August 20-27, 1956) 


1. On the Studentized Largest and Smallest x’, K. V. RaMacHANDRAN, Uni- 
versity of Baroda, India. 


2 2 2 : ; : ; : ; 
Let 8; ,82,°-* , 8 bek independent x? variables with m d.f. each. Let s? be another inde- 
. . nt . . . . 2 / 
pendent x? variable with n d.f. Then the Studentized largest x? is defined as: u = 8max/8?, 
2 2 2 2 ate . . " : 
where 8max = max (8; , 82, -** , 8). Similarly the Studentized smallest x? is defined as: 


v = 844;,/82, where Sin = min (87, 83, --- , 82). Using methods given in an earlier paper 
the upper and lower 5 percent points of u and v are given for different values of m, n and k. 
These statistics have been found to be useful in several situations including control of 
quality, simultaneous confidence interval estimation, testing for normality against uniform 
distribution, etc. (Received April 16, 1956.) 


2. The Linear Hypothesis, Information, and the Analysis of Variance, (Pre- 
liminary Report), CuHester H. McCatux, Jr., The George Washington 
University. 


The concepts of “information” (designated by ‘‘7’’) and ‘‘mean information per observa- 
tion’”’ (designated by ‘‘J’’) for differentiation between two hypotheses first appeared in 





ABSTRACTS 861 


“On {Information and Sufficiency”? (S. Kullback and R. A. Leibler, Ann. of Math. Stat., 
Vol. 22 (1951), pp. 79-86). Since that time numercus articles have appeared on the same 
subject, discussing applications of Information to multivariate analysis as well as analytical 
developments of distributions of certain of the Information statistics. In this paper, the 
linear hypothesis and its application to the analysis of variance is considered from the view- 
point of mean information per observation to discriminate between the usual null hy- 
pothesis and composite generalized alternative hypothesis. The information measure of 
divergence between two hypotheses, J(1:2), is introduced. Best unbiased estimates of the 
parameters are employed in estimating the mean divergence, and we call this estimate 
J (1:2)—read “J-caret.” The statistic J (1:2) is identical to k,F where k; is some appropri- 
ate degree of freedom and F has the analysis of variance distribution with k; and kz degrees 
of freedom. Applications are made to one-way, two-way, two-way with replication, and 
Latin Square designs. It is shown that the given method applies to orthogonal and non- 
orthogonal designs. (Received April 23, 1956.) 

3. A Sequential Multiple Decision Procedure for Selecting the Multinomial 


Event with the Largest Probability, (Preliminary Report), R. E. Becu- 
HOFER, Cornell University, and M. Song, Bell Telephone Laboratories. 


Let x; = (x13 , 22; , *** , ej) be independent vector-observations from a single multi- 
nomial population with a common unknown probability vector p = (pi , p2, +++ , Pe); here 
p; is the probability of the event £;(0 < p; < 1, >in ps = 1) and a; = 1 or O according 
as E; does or does not occur at the jth stage (i = 1,2, --- ,k;j7 = 1,2, ---). Let pny S 
P22) S ++: S Pre denote the ranked probabilities; let A = pu; + pri) 2 1. A sequential 
procedure is proposed which guarantees a probability of at least P*(1/k <s P* < 1) of se- 
lecting the event associated with pj; whenever A 2 A*(1 < A* < @); the constants P* and 
A* are preassigned. Let yim = yn ij(i = 1,2,-++ ,k) and let ynjm S yim S -+* S YtRim 
denote their ranked values. Let E::;, denote the event associated with yiijm at the mth 
stage (i = 1,2, --- , k). Procedure: “At the mth stage (m = 1, 2, ---) take the vector ob- 
servation z, and compute W, = a [1/A*]"[k]m “lim. If Wn S (1 — P*)/P*, stop 
and select Eujm (or if at this stage yu-eum < Yu-titlm = *** = Yelm, Select one of 
Ex-t41)m , *** » Epejm using a randomized device which assigns probability 1/t to each 
of them); if W. > (1 — P*)/P*, take the vector z,4: and compute W,,,,; .”’ This proce- 
dure hes probability one of terminating. It can be generalized to handle problems such 
as obtaining a complete ranking of the k probabilities. (Research supported in part by 
the U.S. Air Force through the Office of Scientific Research of the ARDC.) (Received 
May 4, 1956.) 


4. On the Existence of Uniformly Efficient Estimates, R. R. Banapur, Uni- 
versity of Chicago. 


Suppose that the sample point z is distributed according to some one of a given set P of 
probability measures p. A real valued function g on P is said to be Le-estimable if there 
exists at least one unbiased estimate of g, say ¢(x), such that the variance of ¢ is finite for 
each p. Let y = S(z) be a minimal sufficient statistic for P. It is then said that y is Le- 
complete (for P) if t = 0 is essentially the only unbiased estimate of zero that depends on 
z only through S and that has finite variance for each p. As may be seen from an argument 
by Lehmann and Scheffé (‘‘Completeness, similar regions, and unbiased estimation, Part I,” 
Sankhyd, 10 (1950), pp. 305-340), if y is L2-complete, then every L2-estimable parameter g 
possesses an unbiased estimate of uniformly minimum variance. The main conclusion of 
the paper is that, conversely, if every L2-estimable g possesses an unbiased estimate of 





862 ABSTRACTS 


uniformly minimum variance, then y must be L:-complete. It is also shown by an example 
that in general this converse does not hold with ‘‘L2-estimable” and ‘‘L2-complete’’ re- 
placed by the parallel L; concepts. (Received May 7, 1956.) 


5. On the Distribution of Ranks and of Certain Rank Order Statistics, 
MEYER Dwass, Northwestern University and Stanford University. 


Suppose X, ,--- , Xm and X,,4; ,--- , Xy are two independent samples from two possibly 
different populations, R, , --- , R» are the ranks of the first m observations in the combined 
sample, and Ry4:,--- , Rw are the ranks of the remaining observations. Various moment- 
generating functions involving these ranks are derived, including that of the Wilcoxon 
statistic. The asymptotic distribution of a finite number of ranks is derived as N — ~. 
The remainder of the paper studies certain aspects of the distribution theory of rank order 
statistics of the form jan fn(Ri/N). The Wilcoxon statistics and Hoeffding c,-statistic 
are special cases of such a statistic. Many previous studies have been devoted to showing 
asymptotic normality. The main purpose here is to show that for certain combinations of 
sample sizes m, n, the limiting distribution may be non-normal as m — «, n—» ~, and 
m/N — 0. (Work performed under Office of Naval Research Contract Nonr-225(21).) (Re- 
ceived May 8, 1956.) 


6. Contributions to Distribution-Free Population Comparisons, Winuiam EF. 
PerRRAvLt and Wa.upo A. VezeEau, St. Louis University. 


Distribution-free techniques based on the median of the combined samples and the num- 
ber of runs in each sample are employed to test, by means of the chi-square criterion, the 
null hypothesis that two, or in general k, samples have come from the same population. 
An extension is carried out to the two-criterion experiment. Procedures are supplied for the 
following: paired or unpaired replicates or groups of replicates, testing whether the superi- 
ority of one material over the other is consistent throughout the range of conditions studied, 
confidence limits for the mean difference between treatments. The power curve for the 
two-sample test is given. (Received June 4, 1956.) 


7. A Probabilistic Model Describing Drop Count Data for Certain Closed 
Chamber Experiments, Roperr R. Reap, University of California at 
Berkeley. 


As a fast particle passes through a cloud chamber it collides with the atoms present. 
Each collision results in the production of some ion pairs. One ion pair is formed directly 
as a result of the primary collision and some secondary ion pairs may be formed as somewhat 
of a chain reaction if the transfer of energy in the primary encounter is large enough. Water 
droplets are made to condense on the ions by expanding the chamber. However, not all of 
the ions acquire droplets. Furthermore, positive and negative ions are not equally efficient 
as nuclei of condensation. Let X represent the number of drops counted in a given length 
of track. Let y represent the number of primary collisions, and let 6, be the number of 
secondary collisions resulting from the kth primary collision. Let uz;(v.;) be one or zero 
according to whether or not the positive (negative) ion in the jth ion pair of the kth pri- 
mary collision acquires a droplet. Then X = >= (uz; + vx;). The distribution of y is 
known to be Poisson. The distribution of secondaries is not known, but the use of a Poisson 
law seems to work. The ux; and v,; are independent and have binomial distributions. The 
distribution of X is deduced by making use of these assertions. The model was successfully 
fitted to 21 tracks. Formulae for computing probabilities are displayed. Some refinements 








ABSTRACTS 863 


and modifications of the model are discussed. A method to estimate the parameters is 
presented. (Received June 15, 1956.) 


8. Efficient Small Sample Nonparametric Median Tests with Bounded Sig- 
nificance Levels, Joun E. Watsu, Lockheed Aircraft Corporation. 


Nonparametric tests and confidence intervals for the median of a continuous statistical 
population always can be obtained from a sample from that population by a sign test type 
procedure. However, investigation has indicated that nearly all of these results have mod 
erate or low efficiencies. The only practically important exceptions occur for small sample 
size cases which are based on the largest and/or smallest of the sample values. Conse 
quently, the sample size, significance level, and confidence coefficient values available for 
these efficient sign test type results are very limited. This paper presents some additional 
nonparametric results which appear to be reasonably efficient and which noticeably in 
crease the available numbers of sample sizes, significance levels, and confidence coefficients 
These results are for small samples and do not have exactly determined probability proper 
ties. That is, the significance level for a test is bounded between two specified numbers but 
its exact value is unknown; similarly, for confidence coefficient values. One-sided and two- 
sided tests and confidence intervals are presented. The upper and lower bounds for the one 
sided results are only moderately close together. For the two-sided results, the bounds are 
quite close together unless the continuous population sampled is extremely unsymmetrical 
The bound values almost always can be considered sufficiently close for application if a 
continuous monotonic transformation of variable is available which yields an approxi- 
mately symmetrical population. (Received June 18, 1956.) 


9. Validity of Approximate Normality Values for y + &ké Areas of Practical 
Type Continuous Populations, Joun E. Wausn, Lockheed Aircraft Cor- 
poration 


Let us consider a continuous statistical population with mean yu and standard deviation 
a. A useful empirical relation which seems to be approximately valid for many such popula 
tions concerns the fraction of the population contained in the interval » — ke to uw + ke. 
This empirical relation states that the fraction contained in this interval is nearly equal to 
the value obtained by assuming that the population is normal. This paper presents results 
which show that the stated empirical relation is roughly valid for a rather general class of 
continuous populations. The populations of this class are referred to as practical type 
populations. The class considered consists of those populations with probability density 
functions which can be adequately represented by the first seven terms of their Edgeworth 
series expansion. The closeness of the approximation obtained by use of the empirical rela- 
tion changes with the value of k. The preferable values for k are investigated. Some possible 
applications of the empirical relation are outlined for what appears to be the most desirable 
value of k. These applications include quality control chart use, confidence intervals for o, 
confidence intervals for the probability of success for a binomial population, and joint 
confidence regions for u and «. (Received June 18, 1956.) 


10. Bayes Approach to Control of Fraction Defective, Joun V. BreakweE tL, 
North American Aviation, Incorporated. 


Following Girshick and Rubin, the optimum quality control rule, when the fraction de- 
fective may jump at any time from a known low value to a known higher value, and when 
the probability of this jump is also known a-priori, belongs to a certain two-parameter 
family of control rules. The determination of the economic payoff as a function of these two 


"Se, 





864 ABSTRACTS 


parameters requires the solution of complicated integral-difference equations which may be 
replaced by differential-difference equations when the fractions defective are sufficiently 
small. Some asymptotic formulas are obtained from these latter equations by letting one 
of the parameters grow large. Numerical comparison with some Monte Carlo simulations 
of the original scheme indicate that these asymptotic formulas may prove useful as approxi- 
mations to the operating characteristics of the control scheme. (Received June 22, 1956.) 


11. Incomplete Sufficient Statistics and Similar Tests, Roperr A. Wissman, 
University of California at Berkeley, (introduced by David Blackwell). 


If, under the hypothesis, a sufficient statistic is incomplete, the easily constructible 
similar tests of Neyman structure are not the only similar tests and may, in fact, be use- 
less. For a class of exponential densities it can be proved, under some restrictions, that a 
sufficient statistic is incomplete if its dimension is greater than the number of parameters 
which specify the distribution under the hypothesis. The method of proof also provides a 
method of constructing a large class of similar tests. Applications are possible in the Beh- 
rens-Fisher problem and in the problem of tests of hypotheses concerning the ratio of mean 
to standard deviation in a normal population. In the latter case we have X,, --- , Xa: 
N(u, o?) and independent; the hypothesis specifies the value of y/o to be po , and a minimal 
sufficient statistic is T = (7: , T:) with T, = > Xi, 7: = >> X;. Similar test functions 
can be constructed by virtue of the fact that for any function g(t; , f2), satisfying some mild 
conditions, we have 


© © 1 Po a 320) 
ex ——t = § — — 2p0— } dt, dtz = 0 
f, Tio | oa? (3 m ot, - 
identically in ¢. (Received June 25, 1956.) 


12. Multi-Decision Problems for the Multivariate Exponential Family, DonaLp 
R. Truax, California Institute of Technology. 


A study of a class of decision procedures was made when the underlying distribution 
belongs to the multivariate exponential family. The number of possible actions is assumed 
finite, and the case of two possible decisions is studied in detail. When the loss functions 
L, and Lz have the property that the set where L, — Lz changes sign is linear, it has been 
possible to characterize the Bayes solutions and obtain complete classes of decision pro- 
cedures. Two main problems of this type are considered and then various extensions of 
these problems given. The first main problem involves deciding whether or not the unknown 
parameter point lies on a given r-dimensional subspace of an n-dimensional parameter 
space, and the second problem concerns a decision as to which half space a parameter be- 
longs. Applications of this theory are made to some classical problems of testing composite 
hypotheses. It is also shown that if the set where L, — L, changes sign is not the union of 
parallel linear sets, no nice characterization can be given to the Bayes procedures. Some 
problems where the number of possible actions is greater than two have been considered, 
and again complete classes were obtained. The question of admissibility has been studied 
for some of these problems. (Received June 26, 1956.) 


13. Some Distributions Related to Di, Z. W. Brrnpaum and R. Pykxs, Uni- 
versity of Washington. 


Let U; , --- , U, be an ordered sample of a random variable which, without loss of gen- 
erality, is assumed to have uniform distribution in (0,1), and let DZ = maxi <;<,{i/n — U;}. 
Consider the random variables i*, U* defined by DZ = i*/n — Uy , U* = Ue. Explicit 





es 


ABSTRACTS 865 
formulae are obtained for the probabilities, Prob {U* < u,i* = j},0<us1,j=1,-:-, 
n, and p; = Prob {i* = j},7 = 1, --- , n. Some of the consequences of these formulae are: 


(i) pi < p2 < +++ < pn; (ii) Prob {U* < u} = u;0 <ul; (iii) lim,,, np: = em, 
lim,,. Pn = e. The asymptotic distribution of i* is obtained by studying the random 
variable a, = i*/n, for which (iv) lim,.,, Prob {a, < u} = u,0 < u < 1. This statement 
is made more specific by showing that (v) E(a,) = 2-'{1 + n-*""'n! ia n‘/i! and (vi) 
suposusi (u — Prob {an < uj) S n-*—'n! > fa né/il = O(n-4). (Received July 2, 1956.) 


14. The Distribution of the Extreme Mahalanobis’ Distance from the Sample 
Mean, (Preliminary Report), Yvonne G.M.G. (Mrs. P. M.) Curt.e, 
University of British Columbia, (introduced by 8. W. Nash). 


The problem of classification in multivariate analysis is considered. For the special bi- 
variate case of three groups with the common dispersion matrix known, the distribution of 
the extreme Mahalanobis’ distance from the centroid of the groups has been derived, and 
the cumulative distribution has been partially tabulated. If the three groups are found to 
be heterogeneous—that is, if the sum of the Mahalanobis’ distances from the groups to their 
centroid is larger than x? with 4 degrees of freedom, it is now possible, by comparing the 
extreme Mahalanobis’ distance with the tabulated value of the cumulative distribution, to 
test the hypothesis that the group associated with this extreme distance belongs to the 
same population as the other two groups. For the general case, the characteristic function 
of the joint distribution of the Mahalanobis’ distances from the sample mean has been 
derived. (Received July 2, 1956.) 


15. The Quadratic Birth Process, Permer W. M. Joxnn, University of New 
Mexico. 


The simplest divergent birth process is that in which \, = An’, n(0) = 1. The process is 
divergent in the sense that for all t > 0 there is a positive probability p,(t) that an infinite 
number of births have occurred. It is shown that this probability is given by 1 — 2 a 
e~"t(—1)", which is the Jacobi Theta function 0,(0, e™*). (Received July 2, 1956.) 


rml 


16. Sequential Distribution-free Tolerance Regions, Sam C. SaunprErs, Uni- 
versity of Washington. 


Let X; , X2, ++ be independent observations of a random variable X, and let Y;., , be 
the jth order statistic of the first n of these observations X, , X2 , --- X, , with the conven- 
tions Yon = —®©, Yasin = +”. Denoting by A, some subset of {1, 2, --- ,n + 1}, for 
n = 1,2,-+- , we form the union of closed intervals An = Uj, [Yj-1.n , Yj.n]. Let ai , a2, --- 
be a sequence of integers 20. We agree to continue making additional observations until 
for the first time, after X; , --- , X, have been obtained and A, have been determined, the 
additional observations X,41, --- , Xn4c, are all contained in A,. If this happens for 
n = N, the set Ay is called a “‘tolerance region,’”’ and the random variable Q = Prob 
{X e Ay} is called the ‘‘coverage.’’ The distribution of Q, which depends on the choice of 
{An} and {a,}, is obtained for various choices of these sequences. Expected sample sizes 
are found, and criteria are proposed for comparing such sequential procedures. (Received 
July 2, 1956.) 


17. Definite Quadratic Forms and Discontinuous Factor, ANpre G. LAURENT, 
Michigan State University. 


In many instances, the derivation of the distribution of a positive quadratic form X’AX, 
with X n-dimensional and f(X) distributed, is greatly simplified by using Dirichlet’s dis- 





866 ABSTRACTS 


continuous factor for tne n-dimensional sphere. P(X’AX Ss R*) = P(Y’Y gs R*) = E(U), 
with U = (R/2r)*/? f tee f (TT’)-™4etTY J,.[R(TT’)}] dT, where Jn. denotes the Bessel 
Function of first kind and order n/2, and E(_) is taken with respect to the df g(Y) of Y. 
Under the usual assumptions relative to convergence and order of integration, P(Y’Y S 
R?) = (R/2x)* 2f see f h(T)(TT’)-"4 J n2o([R(TT’)§| dT, which is a generalised multivariate 
Hankel transform of the characteristic function h(7’) of Y,i.e., P(Y’Y < R*) = (R*/4r)"/?- 
CE{>. (—1)* (R°TT’ /4)*/k! T(n/2 + k + 1)}, where E( ) is taken with respect to the 
pseudo ‘“‘df” h(7)/C. In case g(Y) is N(O, Z), one obtains P = (R2/2)"/2 |= | ‘E>. (—1)*- 
(R?/2)*(T'T’ /2)*/k!T(n/2 + k + 1)], where E( ) is taken with respect to h(T)/C = N(O, 
>). If, further, n = 2 and a, b are the eigenvalues of 2, E[(T7T’/2)*] = a-*k!.Fi(—k, 4; 
9 


2 k ‘9; 
1;1— a/b), i.e.,P = D>o (R/2a)***(a/b)(—1)*/(k + 1) zt (—0)( \( ’) (1 — a/b)i- 
J \J 
4-3, In case g(Y) is N(m, 2), P = R?|2| 4 fe™NO, 2 )(TT’) "4S nel R(TT’)§) AT, 
i.e., R? | D> |“ exp [—4m’Im] f wee f N (im, 2™)(TT')-"4 J ,2{R(TT')4| dT. (Received July 
3, 1956.) 


18. On the Moments of Order Statistics from a Normal Population, R. C. 
Bose and Santi 8. Gupta, University of North Carolina. 


It is shown that if zj is the kth order statistic for a sample of size n from a normal 
. » Ss — 
population NV (0, 1), then u,(n, k), the rth moment of rq) about the origin, can be expressed 
. . . ° +@ 
in terms of lower moments of order r — 2i, (¢ = 1, 2, ---) and the integral f 2 Prs+i(z)- 


»« 


2 . > r ; r 
(r+02*/2 dx, where P,,:(x) for r = 0 is defined by P,4:(z) = k f d*/d@"|@*(1 — &)*-*], 


where ® is replaced after differentiation by #(x), the edf of N(0, 1). Exact values of all odd 
order moments can be derived when n S 5, and exact values of all even order moments can 


be derived when n S 6. Godwin (Ann. Math. Stat., 1949) has given a table of exact moments 
for r = 1 and 2. The corresponding tables for r = 3 and 4 have been provided. In general, 
numerical evaluation of the integral given above can be expeditiously done by using the 
Gauss-Jacobi method of mechanical quadrature based on zeros and weight factors corre- 
sponding to Hermite polynomials for which tables have been provided by Salzer, Zucker, 


and Capuano (J. Research Nat. Bur. Standards, Vol. 48). (Received July 5, 1956.) 


19. Maximum Likelihood Estimation of Restricted Parameters, (Preliminary 
Report), H. D. Brunk, University of Missouri. 


Let the ith of k populations depend exponentially (Blackwell and Girschick, Theory of 
Games and Statistical Decisions, pp. 179-194) on a parameter w; . For independent random 
sampling from these populations, the maximum likelihood (m.1.) estimator of the parameter 
point w = (w: , w2,--- , we), given that it lies in a given closed subregion, Sp , of its “‘nat- 
ural”? domain, is described. The estimator is shown to possess a property related to suffi- 
ciency. If So is bounded by hyperplanes, if A is the interior of Sp , or any of its faces, or 
edges, etc., if (x) is the m.l. estimator of w, and if E is an event, then for 4(z) ¢ A there 
is a determination of the conditional probability p.(E | 4(z)) which is independent of w in 
the closure of A. The following consequence is related to the interpretation of sufficient 
estimator found in Halmos and Savage (Ann. Math. Stat., Vol. 20 (1949), p. 240). If w is 
common to all faces, edges, etc. of Sy , then the distribution of an arbitrary statistic can 
be duplicated by the procedure described by Halmos and Savage; if w ¢ So , the distribution 
of an arbitrary statistic can be approximated for large sample sizes. (Received July 5, 1956.) 








ABSTRACTS 867 


20. A Comparison of the Power Curves of Some Double Sample Tests, 
Donatp B. Owen, Sandia Corporation. 


The exact power curves of the double sample tests introduced by the author (Ann. Math. 
Stat., Vol. 24 (1953), pp. 449-457) for hypotheses on the mean, assuming a normal popula- 
tion and known standard deviation, have been computed. The r-test procedure given in 
the referenced article is shown to be more powerful than one chosen from Bowker and 
Goode’s Sampling Inspection by Variables (McGraw-Hill Book Co., 1952) after equating 
expected sample sizes. Tabulations also show that equal size samples at the two stages give 
a more powerful procedure, except for extreme alternatives, than taking the second sample 
twice the size of the first sample when expected sample sizes are equated. These differences 
are small but could become significant if many tests using these procedures were con- 
ducted. (Received July 5, 1956.) 


21. On a Uniqueness Property not Enjoyed by the Normal Distribution, 
GeorGe P. Sreck, Sandia Corporation. 


It is known that if X; and X2 are normal random variables with mean zero and variance 
o*, then X,/X:2 has a Cauchy distribution. However, the fact that the ratio of two inde- 
pendent identically distributed random variables X, and X; with EX, = EX, = 0 has a 
Cauchy distribution does not imply that the random variables involved are normel. It 
can be shown that, for X,/X~ to have a Cauchy distribution, the characteristic function of 
log X; must be of the form e*“/(cosh 2t)}, where @(t) is real and odd. Convenient choices 
of the function 6(t) give tractable integrals; for example, if @(t) = arctan tanh ¢, then 
Px, (z) = (24/)x?/(1 + 2*). (Received July 5, 1956.) 


22. Confidence Intervals for the Number of Cells in a Multinomial Population 
with Equal Cell Probabilities, Bernarp Harris, Stanford University 
and Department of Defense. 


Consider a sample of n tosses of an r-sided die, with faces labeled a,a+1,---,a+r— 1. 
The parameter a and r are unknown, and we wish to. determine confidence limits for r. The 
joint distribution of the smallest and largest observation is computed, as well as the dis- 
tribution of the sample “‘range’’ * = z, — x; + 1. Unless n is very small relative to r, and 
for confidence coefficient 1 — a, (a small), suitable upper and lower confidence limits c, , 
cz are given by ci = f; ce is the largest positive real root of P(y) = #* + (y — f)(#* — 
(* — 1)") — ay" = 0 which can be determined by numerical methods. (Received July 6, 
1956.) 


23. On Some Non-parametric C-sample Tests, Frep C. ANDREws, University 
of Nebraska. 


The Q test statistic (proposed by Terpstra, Koninklijke Nederlandse Akademie van 
Wetenschappen, Series A, v. LVII, 1954, p. 505) is shown to be asymptotically the sum of 
two quadratic forms Q, and Q: , where Q; isa function only of c—1 Mann-Whitney statistics 
and Q: is a function of linear combinations of Mann-Whitney statistics with coefficients 
depending upon the sample sizes. A c-sample test based upon the statistic Q, is discussed 
and is shown to have asymptotic relative efficiency one with respect to the Wallis-Kruskal 
H test. (Received July 9, 1956.) 





868 ABSTRACTS 


24. An Asymptotically Distribution-free Multiple Comparison Method with 
Application to the Problem of n Rankings of m Objects, InpenE RosEn- 
THAL and THomas §S. Ferauson, University of California at Berkeley. 


The statistic known as Hotelling’s T? is seen to be asymptotically distribution-free and 
thus to provide a multiple confidence ellipsoid for the joint population means from which 
multiple comparisons of the means may be made. If m objects are ranked in order by each 
of n judges, Friedman’s test is used to test the hypothesis that the judges rank the objects 
at random. However, itis of interest usually to inquire which of the objects are ranked sig- 
nificantly higher than which others, in which case Hotelling’s T? may be used to provide an 
asymptotically distribution-free multiple comparison. It is seen also that the computations 
involved in the multiple comparison are very simple. (Received July 9, 1956.) 


25. Idempotent Matrices and Quadratic Forms in the General Linear Hy- 
pothesis, FranKLIn A. GRAYBILL and George Marsaaiia, Oklahoma A. 
and M. College. 


The important role that idempotent matrices play in the general linear hypothesis theory 
has long been recognized, but their usefulness seems not to have been fully exploited. The 
purpose of this paper is to state and prove some theorems concerning idempotent matrices 
and to point out how they might be used in linear hypothesis theory. Let x'*(p, A) represent 
a non-central Chi-square variate with p degrees of freedom and with non-centrality A. By 
an idempotent matrix B we will mean a square matrix such that B = B’ and BB = B. The 
following theorem is shown to be true. Let Y be an n X | random vector which has a multi- 
variate normal distribution with mean equal to the n X | vector uw and with variance-covari- 
ance matrix V (positive definite). Also let B be an n X n matrix with rank equal to p, and 
B;(i = 1,2, +--+ , k) beann X nmatrix withrank equal top; where Y’'BY = Y’B,Y + Y’B:Y 
+ --- + Y’B,Y. Then any one of the six conditions C; , C:,C3;,Cs,Cs,Cs, is necessary 
and sufficient that the Y’B;Y be independently distributed as x’? (p; ,\;) where \; = 4u’Bin. 
C; ; BV be idempotent and p; + po + --- + pe = p. C2 ; BV and each B;V be idempotent. 
C; ; BV be idempotent and B;VB; = ¢ (where ¢ is the null vector or the null matrix) for all 
t # j. Cy; Y'BY be distributed as x’*(p, A) and pi + po + +++ + pe = P(A = 4u'By). Cz ; 
Y’BY be distributed as x’*(p, 4) and B;V be idempotent (where \ = 4yu’Bu). C, ; Y’BY be 
distributed as x’?(p, \) and B;VB; = ¢ for alli # j. From this general theorem many im- 
portant special cases can be derived, viz., if B = V = I (the identity matrix) and if u = ¢, 
then C, is the well known Cochran-Fisher theorem on quadratic forms for normal indepen- 
dent variables. (Received July 9, 1956.)* 


26. Some Asymptotic Results on Wald’s Approximate Classification Statistic, 
M. Iqpau, University of North Carolina.* 


In his paper ‘‘On a Statistical Problem Arising in the Classification of an Individual into 
One of Two Groups”’ published in the Annals of Mathematical Statistics (1944), Wald pro- 
posed a statistic U for use in classification procedure. In an attempt to find its exact sam- 
pling distribution, he ended up with a joint distribution of three variables, m; , mz and m; , 
and showed that nm; can be taken as an approximate classification statistic. In the first 
part of this paper, an asymptotic series is obtained for ms by starting with the joint dis- 
tribution of m, m,, and m, in the degenerate case p, = 0 = ¢, and, by noticing that the 
region of integration for m; and mz for fixed m; is the one enclosed by two hyperbolas in the 
(m; , mz) plane. In the second part of the paper, asymptotic moments for nm; are obtained 


* Research under Office of Naval Research Contract No. NR-042-031. 





ABSTRACTS 869 


by using the fact that m; , mz, and ms are of order 1/n in the probability sense, and the 


corresponding asymptotic distributions are obtained both for even and odd values of p. 
(Received July 9, 1956.) 


26a. On Infinitely Divisible Random Vectors, Meyer Dwass, Northwestern and 


Stanford Universities, and Henry Tercuer, Purdue and Stanford Univer- 
sities. 


A normally distributed random vector X is well known to be representable by A — Y (in 
the sense of having identical distributions) where A is a matrix of constants and Y is a 
random vector whose component random variables are independent. A necessary and suffi- 
cient condition for any infinitely divisible to be so representable is given. The limiting case 
is discussed as are connections with the multivariate Poisson distribution and stochastic 
processes. (Received July 10, 1956.) 


26b. A Further Contribution to the Theory of Univariate Sampling on Successive 
Occasions, (Preliminary Report), B. D. Trxxrwat, University of North 
Carolina and Karnatak University. 


The general theory of Sampling on Successive Occasions for a single variate has been 
studied independently by Patterson (J.R.S.S., 1950, B 12) and the author (Theory of suc- 
cessive sampling. Unpublished thesis submitted towards partial fulfilment of requirements 
for Diploma, I.C.A.R., New Delhi, 1951). Both the above authors obtained the variance of 
the best linear unbiassed estimator under the assumptions that the weight 6, and the 
regression coefficient occurring in the estimator are not calculated from the sample, but are 
known in advance. Now, the variance of the estimator has been obtained in this paper 
without these assumptions. It is shown, that, when the common units on fth occasion is 
the sub-sample of n?_;, the new units on (t — 1)th occasion fort = 3, --- h, the variance of 
the estimator on the A-th occasion is given by E(¢xok/nk), where $h is the estimate of the 
weight ¢, . A consistent estimator of the variance-expression has also been obtained. It is 
further shown, that, the modification, suggested by Narain (J. Ind. Soc. Agric. Stat. 1953) 
in the weighting procedure given by Patterson and the author, results in the increase of the 
variance of the estimator and thus making it less efficient. (Received July 10, 1956.) 


26c. Invariance, Sequential Decision Functions, and Continuous Time Processes, 
Proressor J. Krerer, Cornell University. 


It is shown that in sequential decision problems where certain conditions on a sequence 
of sufficient statistics hold, there exist fixed sample-size invariant procedures which are 
minimax in the class of all decision functions. Examples are problems of sequential estima- 
tion of an unknown real scale and/or location parameter for the normal, gamma, and 
rectangular distributions (excepting the location parameter alone in the latter), certain mul- 
tivariate estimation problems for these distributions, etc. The method applies also to estima- 
tion problems for the Wiener and gamma processes in one or several dimensions, and a 
modification of the result is obtained for processes such as the Poisson process where there 
is an invariance in time. (Some of these results were obtained for special weight functions 
by various authors using the Bayes or Cramér-Rao techniques.) The method of proof uses 
an invariance theorem which is a slight generalization of one due to Peisakoff and which 
may also be applied to many nonsequential problems. Various sets of conditions on the 


weight function and group, under which this theorem holds, are given. (Received July 12, 
1956.) 








870 ABSTRACTS 


26d. A Stochastic Model for the Tunnelling and Retunnelling of the Flour Beetle, 
Mouamep §S. AuMeED, University of California at Berkeley. 


The stochastic model developed to describe the tunnelling and retunnelling of the flour 
beetle is a Markov chain with only three states: s,; , tunnelling, sz: , stationary, and 3; , 
retunnelling, and with continuous time parameter and stationary transition intensities 
q(t, j = 1, 2, 3). Transition from one state to any other is visualized with the restriction 
that the beetle cannot move from the tunnelling state to the retunnelling state, or vice- 
versa, except by passing through the stationary state. 

For this model: 1) the probability that the beetle is in state s; at the end of the time 
interval ¢ given that it was in state s; at the beginning of this time interval, as well as 2) 
the expected time spent in a state s; out of the total exposure time 7’, is found explicitly in 
terms of the four unknown parameters qi2 , G21 , G23 and qs2 . In addition, a scheme for the 
estimation of these parameters is given. 

This model can be used to study the differences in the behavior of, say, the male and 
the female beetles with regard to the proportion of time they spend in the various states. 
(Received July 13, 1956.) 


26e. The rth Brightest Star in a Galaxy as a Distance Indicator, MANbDAKINI 
Sane, University of California at Berkeley. 


The astronomical problem mentioned in the title reduces to that of estimating the loca- 
tion parameter 6 of an absolutely continuous distribution of a r.v. X from the rth smallest 
observation X, in a sample of size N. Three cases are considered: (i) N fixed, moderate, (ii) 
N fixed but large, (iii) N random tending to + in law. 

Principal results are (a) the three Smirnov’s limit laws for X, hold if the random N tends 
to + suitably. (b) In two of Smirnov’s limiting cases with observables X, , X:,--- , X, 
the last, X, is asymptotically sufficient for 6. In the third case X, is asymptotically suffi 
cient. Normal and Cauchy distributions belong to the first two types. Normal distribution 
truncated from left belongs to the third type. (c) Given an integer p, 0 < p < r, the method 
of best linear unbiased estimates can be used to determine those p of the r available ob 
servables X, , X2, --- , X,, which yield the least variance. (Received July 13, 1956.) 


26f. Effect of Expansion of the Universe on the Serial Correlations of Counts 
of Images of Galaxies in Regularly Spaced Squares—-A Simplified Model, 
Martin Fox, University of California at Berkeley. 


The Neyman-Scott theory of clustering of galaxies in a static universe (Ap. J., 1952), 
and Neyman’s extension to an expanding universe (Ann. Inst. Henri Poincaré, 1955) are so 
complex that non-numerical study of the effect of expansion on serial correlations of counts 
of galaxies seems prohibitive. In order to obtain an insight into the situation it is justifiable 
to consider a simpler although less realistic model. We assume: (i) Galaxies occur in clusters. 
(ii) Cluster centers are Poisson distributed. (iii) Clusters appear as equal and similarly 
oriented squares perpendicular to the line of sight. (iv) Clusters are visible up to a limiting 
distance. (v) The ‘‘number of images of galaxies” in a cell on a photograph equals the sum 
of areas of projections of all ‘‘visible’’ clusters overlapping this cell. (vi) If the universe is 
expanding, the velocity of recession of clusters is directly proportional to their distance 
from the observer.—The present model differs from earlier models by the contents of hy- 
potheses (iii), (iv) and (v).—Under the present model, explicit formulae are given for the 
serial correlation between ‘‘numbers of images of galaxies’ in regularly spaced equal 
squares. These correlations depend upon the dimensions of the squares, the limit of ‘‘vis- 
ibility’”’ of clusters, and whether the universe is expanding. (Received July 16, 1956 











ABSTRACTS 871 


26g. Contributions to Univariate and Multivariate Components of Variance 
Analysis, S. N. Roy and R. GNANADESIKAN, University of North Carolina. 


Assume a ‘normal’ univariate linear hypothesis model which involves m unknown param 
eters in K groups: £1 , +++ , f1m1 3 21, °°* » ame °° 3 Ext, *** » Exme (With > m; = m), 
denoted by o? the common unknown variance of the observations and by (S8.8.); the sum 
of squares due to Ho; : fi: = +--+ = Eimi(t = 1, 2, --- &) and assume that the design matrix 
is such that o? (S.S.);’s are independent x?’s with d. f. (m; — 1)(¢ = 1,2, --- k). Then switch 
ing over to the case where the first m; parameters are independent, N (x: , o:), the second 
me are independent N (us , @3), and also independent of the first m, and so on, and all inde 
pendent of the “error’’ normal variates it is possible to obtain, with a joint confidence 


— 4 . . 2 ae 9 2, 9 . 
coefficient 21 — a, simultaneous confidence bounds on gj; | oj02 | 0”, --- , ox | o?. Multi 
. * . . . 2 | > . . , 
variate extensions of this involving the roots of 223°, --- , 2x=~ is discussed next. What 


happens for the case of design matrices for which independent x?’s are not available (which 
would mean that a similar simplification for the multivariate problem would also not be 
available) is also discussed. (Received July 16, 1956.) 


26h. Further Contributions to Multivariate Confidence Bounds, 8. N. Roy 
and R. GNANADESIKAN, University of North Carolina. 


Assuming S to be the dispersion matrix of a random sample of size n from a p variate 
normal population (with p < n) having a dispersion matrix = and denoting by S™, 2, 
SGD.) ete. the corresponding matrices obtained by cutting out the ith variate, the ith and 
the jth variate, and so on, it is possible to find constants \;_ and Az, such that there exist 
with a joint confidence coefficient 21 — a, confidence bounds 


Ala C(S) mex > C (2) max Ss Ara C(S)msx; Ai aC (S) mia a C(Z) min Ss Are C(S) mia; 


Ala C(S™), x2 c(ze )max = Aza CUS“) max, Ara C(S™) nin Ss C(E™) min Ss Ae C(S) nin; 


Ma C(S)) nox Ss C(Z-4)) nex s Ata C(S@-)) max} 


Mie C(SG 1) min 2 C(E%)) min Ss Ara C(S@-7) nin; and so on. 


C max(M) and Cmia(M) stand for the largest and smallest characteristic roots of a square 
matrix with non negative characteristic roots C(M) (say). With slight modifications similar 
results are true for C(2,23') in relation to C(S,S3') in the case of two samples and two popu- 
lations, and in the case of the regression matrix of p variates on q variates, and also in the 
case of the regression matrix which measures the deviation from the customary multivari 
ate linear hypothesis on means. (Received July 16, 1956.) 


(Additional abstract for the Princeton meeting of the Institute, April 20-21, 1956) 


27. Incomplete Block Rank Analysis: 2”~* Fractional Factorials Using a Method 
of Paired Comparisons, Orro Dyxsrra, Jr., General Foods Corporation, 
(introduced by C. Daniel). 


‘The estimation and tests of significance for main effects and interactions in 2?~¢ fractional 
factorials are developed and illustrated using the 2 X 2 factorial used by Abelson and 
Bradley. The results are comparable. The procedure for handling 4-level factors is also 
discussed. An example shows the partitioning of the 3 degrees of freedom for a 4-level quan- 
titative variable into linear, quadratic, and cubic effects. (Received April 30, 1956). 





872 ABSTRACTS 


(Abstracts of papers submitted for the Detroit meeting of the Institute, September 7-10, 1956) 


28. Some Results on the Analysis of Random Signals by Means of a Cut- 
Counting Process, Irwin Mituer, Graduate Fellow, and Joun E. Freunp, 
Virginia Polytechnic Institute. 


The variance of the number of zeros of a Gaussian signal on a short time interval was de- 
rived in a recent paper by Steinberg, et al. This result is generalized to include the covari- 
ance of the number of zeros of a Gaussian signal at the values 6; and 62 , using a somewhat 
different mathematical approach. A test of normality in a random process is proposed, using 
the general result. (Received May 21, 1956.) 


29. Some Results on the Distribution of the Peaks of a Gaussian Process, 
Irwin Miter, Graduate Fellow, and Joun E. Freunp, Virginia Poly- 
technic Institute. 


The behavior of the relative extrema (peaks) of a Gaussian random process, z(t), inside 
the band @; S z(t) S 62 is discussed in this paper. The expected number and the variance 
of such peaks is derived. The method of derivation uses the concept of functionals, which 
reduces the mathematics involved and adds to the intuitive appeal of the argument. The 
results are specialized to include half-infinite and infinite band widths. These results will 
be useful in studying problems of gust loads and turbulence which arise in aircraft design. 
(Received May 21, 1956.) 


30. A Continuous Time Treatment of the Waiting-time in a Queueing System 
Having Poisson Arrivals, a General Distribution of Service-time, and a 
Single Service Unit, (Preliminary Report), VActav Epvarp BeEnss, 
Bell Telephone Laboratories. 


Let W(t) be the time that a customer would have to wait if he arrived at time ¢; W(t) is 
a mixed-type Markov process, 7.e., both discontinuous and continuous changes occur in 
W (t). The integro-differential equation for the distribution of W(t) is solved by the use of 
Laplace and Laplace-Stieltjes transforms. Let P(t) = pr {W(t) = 0} . The Laplace-Stieltjes 
transform of W(t) is expressed as a functional of P(t); the Laplace transform of P(t) is de- 
termined to be 7~'¢(n), where ¢ is the Laplace-Stieltjes transform of W(0), and where n(r) 
is the unique root of the equation r — 7 + A = 4$*(n), in which dis the Poisson arrival rate, 
and B* is the Laplace-Stieltjes transform of the service-time distribution. The transform 
of the time taken to return to 0 from W(0) is ¢(n). It is shown that any analytic function 
of the root 7 can be expanded in a Burmann-Lagrange series. These results are used to verify 
the asymptotic properties of W(t). A functional relation is obtained between the expecta- 
tion E{W(t)} and P(t); from this relation the covariance function R of W(t) is determined, 
when R is defined, by means of the solution for P(t). It is shown that if the service-time 
distribution has a finite fourth moment, then R is absolutely integrable, and the spectral 
distribution of W(t) is absolutely continuous, with a continuous density. (Received June 


5, 1956.) 


31. On a Multivariate Tchebycheff Inequality, (Preliminary Report), INGRAM 
OLKIN and Joun W. Pratt, University of Chicago. 


Let z be a random p-vector with mean 0, variances 1/k*?, and correlation matrix R. A 
matrix A is admissible if zAzx’ = 1 for re S = {x: some| z; | = 1}, zAz’ 2 O for all z. Then 





ABSTRACTS 


for A admissible, P(z e S) S E(xzAz’) = tr AR/k*?. Let A™ = (1 — t) I + te’e where 
e = (1,---,1), —-(p-—1)7?<t<1. 


Then A is admissible and minimizing tr AR over t yields the sharpest inequality for these A. 
This generalizes the bivariate inequality of Berge (Biometrika, 29(1937)); for the admissible 
A minimizing tr AR, Az’ has a minimum of 1 on each plane z; = 1, and therefore A“! = B 
has unit diagonal. Since tr BR is a strictly convex function of B, it has a unique minimum 
determined by the condition that BR-'B be diagonal. B reflects certain properties of R. 
If R$ has diagonal elements d, B = d='R}, the bound is d*p and is achieved if each row of 
+B has probability d?/2 and 0 has probability 1 — d*p. If R = (1 — p) I + pe’e, the bound is 
[((p —1) Jl —p + V1 + @& — Ipl?/pk?, which reduces to the univariate Tchebycheff 
inequality for p = 1. For p uncorrelated variables the bound is pk~*, whereas the Tcheby- 
cheff bound for p independent variables is 1 — (1 — k-*)?. A simple transformation gen- 
eralizes all these results to arbitrary variances and sets {x: some | z; | = k;}. (Received 
June 15, 1956.) 


32. Unbiased Estimation of Correlation Coefficients, INGRAM OLKIN and JOHN 
W. Pratt, University of Chicago. 


Unbiased estimators of certain multivariate normal correlation coefficients are given, 
namely: (1) bivariate correlation; (2) intraclass correlation, i.e., the common correlation 
coefficient of a distribution with equal variances and equal covariances; (3) partial correla- 
tion; (4) squared multiple correlation. In each case, the estimator is a function of a complete 
sufficient statistic and is therefore the unique minimum variance unbiased estimator. It 
is a strictly increasing function of the usual estimator differing from it only by terms of 
order 1/n and consequently having the same asymptotic distribution. The range of the 
unbiased estimator is the region of possible values of the estimated quantity, except in (4), 
where any unbiased estimator must assume negative values. The underlying method is that 
of inverting Laplace transforms. For (1) and (2), the unbiased estimators are G(r) = 
rF (4, 4; (nm —1)/2;1 — r*) and H(r’) = —H(—r’) = 2F (1, (2 — n)/2; n/2;(1—r’)/(1+7r’)) —1, 
r’ = 0, where n is the number of degrees of freedom, r and r’ are the usual estimates, and 
F is the hypergeometric function. Tables of these functions are given. (Received June 15, 
1956.) 


33. Unbiased Estimation of the Normal Distribution Function, (Preliminary 


Report), Witu1am C. Heaty, Jr., Ethyl Corporation Research Labora- 
tories. 


Let X be normally distributed with distribution function (z; 9,¢*) = Pr (X Ss 2 | @,o?), 
6 and o? being unknown. The problem considered is the estimation of @ for a given value 
of z, from a sample X,; , X2, +--+ , X, . Possible estimates include (a) m/n, where m is the 
number of observations X; not exceeding 2; (b) ® (z;X, s*), whereX => > Xi/n and s? = 
Sox: — X)*/(n — 1); and (ec) plots on probability paper. Baker (Ann. Math. Stat., Vol. 20 
(1949), p. 123} has given asymptotic properties of (b) and a comparison with (a), but an 
estimate optimal in small samples seems not to have been discussed. The present paper 
derives the uniformly-minimum-variance-unbiased estimate of , by computing the condi- 
tional probability Pr (Xi S z |X, s*). The result is $(z) =[1l +4 sgn (x — X)]/2, where 
u = I,(4, (n — 2)/2, v = min [1, n(x — X&)*/(n — 1)*s*], and J,(p, q) is the incomplete 
Beta-function ratio. This estimate $(z) and (b) are asymptotically equivalent. Empirical 
investigation of the sampling variance of $(z) is underway, for small-sample comparisons 
with other methods. (Received June 25, 1956.) 





874 NEWS AND NOTICES 


34. On the Construction of Fractional Factorial Designs, (Preliminary Report), 
Rospert C. Burton, National Bureau of Standards. 


For a 1/s” fraction of an s” factorial design, the identity relationship contains (s? — 1) 

(s — 1) ‘“‘words.’’ For example, if s = 3, n = 6, and p = 2, the factors may be denoted by 
A, B,--- , G and a possible identity relationship is J = ABC*D? = CDEF = ABEF = 
ABCDE*’F?, where I is the identity element and ABC?D?, CDEF, ABEF, and ABCDE?F?* 
are the three words. Necessary and sufficient conditions are given for constructing an 
identity relationship containing words having prescribed numbers of letters of each power. 
The argument consists of regarding the words as sets of elements, and showing that they 
may be expressed as unions of certain disjoint sets. (Received July 9, 1956.) 


I 
NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


The Mathematics Department of Catholic University is pleased to announce 
a lecture series in the field of Mathematical Statistics. The lecture series will 
start with the beginning of the academic year 1956-57. It is planned to invite 
prominent mathematical statisticians from the East Coast or visiting the East 
Coast to give addresses. The cooperation of other universities and interested 
organizations in the area will be sought. Enquiries concerning the lecture series 
may be addressed to the Department of Mathematics of Catholic University. 


—— ES 


Dr. Raymond H. Burros has accepted an appointment as operations research 
analyst with Technical Operations, Inc. and is attached to the Combat Opera- 
tions Research Group, Fort Monroe, Virginia. During the academic year 1954- 
1955, Dr. Burros was visiting associate professor of psychology in the University 
of Houston. In the fall of 1955 he held a temporary appointment as director, 
Validation Department, Institute for Motivational Research, Inc. 

Herbert T. David, lecturer in the Committee on Statistics of the University 
of Chicago, has accepted a position at Iowa State College, beginning in Septem- 
ber 1956, as assistant professor in the Statistical Laboratory and the Depart- 
ment of Statistics for teaching, research and consulting work in industrial 
statistics. 

Richard De Lancie, now a partner in Broadview Research and Development, 
has returned to the home office in Burlingame, California, after establishing a 
Washington, D. C. office for the firm. 

James R. Duffett, formerly at White Sands Proving Ground, is now at Radio 
Plane Company, Van Nuys, California. 

The Copley Medal has been awarded by the Royal Society to Sir Ronald 
Fisher, F.R.S., Arthur Balfour Professor of Genetics, Cambridge University, 
for contributions to developing the theory and application of statistics for making 
quantitative a vast field of biology. 





NEWS AND NOTICES 875 


Marvin A. Kastenbaum was awarded a Ph.D. degree in Experimental Sta- 
tistics at North Carolina State College in January 1956. He is now employed as a 
statistician on the Mathematics Panel at the Oak Ridge National Laboratory. 

Wharton F. Keppler has transferred from an Operations Analyst position, Air 
Proving Ground Command, Eglin AFB, Florida, to the position of Chief, 
Statistics Section, Engineering Sciences Division, Signal Communications Dept., 
Army Electronic Proving Ground, Fort Huachuca, Arizona. 

Frank J. Massey, Jr. has accepted an appointment as Associate Professor of 
Biostatistics, School of Public Health, University of California at Los Angeles. 

Paul L. Meyer has accepted the position of Assistant Professor in the Depart- 
ment of Mathematics at Washington State College. Since the completion of his 
Ph.D. at Stanford University in 1954, he has been serving his tour of duty as a 
statistician with the Chemical Corps of the U. S. Army. 

Joseph M. Moser has started work on his Ph.D. degree at the second semester 
of the year 1955-1956 in mathematics, after spending seven months working for 
the U.S. Army Ord. Corps at Aberdeen, Indiana. 

George McLaughlin, who received his M.A. in mathematical statistics at the 
University of Toronto in 1955, is now statistician for the Canadian Armament 
Research and Development Establishment, Valcartier, Quebec, Canada. 

John F. Pauls has been given the George W. Snedecor Award in Statistics for 
1956 by vote of the graduate faculty in statistics at Iowa State College. The 
award, established by the Statistical Laboratory to honor its first director, is 
given annually to the most outstanding candidate for the Ph.D. degree in sta- 
tistics at the college. Pauls has accepted a position as research statistician with 
Smith, Kline & French Laboratories in Philadelphia, beginning in June 1956 

K. C. Sreedharan Pillai has been deputed for a year by the United Nations 
Technical Assistance Administration to the Philippines as UN Senior Statistical 
Advisor, at the Statistical Center, University of the Philippines. 

Dr. Paul R. Rider of the Wright Air Development Center, Wright-Patterson 
Air Force Base, Ohio, will spend the summer on temporary duty with the Summer 
Research Group of the Holloman Air Development Center, at Cloudcroft, N. M. 

Gerhard Tintner has been granted leave from Iowa State College to accept a 
position as visiting professor of econometrics at the University of Vienna, 
Austria, for the 1956-57 academic year. During his absence, William M. Gorman, 
lecturer and head of the Department of Economics & Statistics, University of 
Birmingham, England, will teach courses in econometrics and econometric sta- 
tistics at lowa State under a visiting professorship. 

John S. White, formerly Assistant Professor of Statistics at the University of 
Manitoba, has accepted a position in the Operations Analysis Department of 
Ball Brothers Co., Muncie, Indiana. 

Robert F. White received the degree of Master of Science with statistics major 
in June at Iowa State College; he is continuing work there toward a Ph.D. in 
statistics. 

Martin B. Wilk has resigned from the Statistical Laboratory, Iowa State 





876 NEWS AND NOTICES 


College, effective June 30, 1956 to remain another year at Princeton University, 
doing statistical research. 

Samuel Zahl, formerly working for the United Community Services of Boston, 
Mass., is now working for R.C.A. at their Aviation Systems Lab. in Waltham, 
Mass., as a mathematical statistician. 

Zenon Szatrowski, Chairman of the Statistics Department, School of Business 
Administration, University of Buffalo, is on leave for the period 1955-1957. His 
present position is that of Staff Consultant at the Scientific Computing Center, 
International Business Machines Corporation, 590 Madison Avenue, New York 
City. He is working on the application of electronic computers to statistical 
problems. 


Sasa SENREISE Reece 


New Members 
The following persons have been elected to membership in the Institute 
February 11, 1956 to May 15, 1956 


Aries, Robert S., D.Chem.Eng. (Polytechnic Inst. of Brooklyn), President of R. 8. Aries 
and Associates, 270 Park Avenue, New York 17, New York. 

Baker, Frances Ellen, Ph.D. (University of Chicago), Professor of Mathematics, Vassar 
College, Poughkeepsie, New York. 

Bradley, Milton N., M.I.E. (New York University), Quality Control Engineer, Radio Cor- 
poration of America, Tube Division, 415 South 5 Street, Harrison, New Jersey, 2481 
Davidson Avenue, Bronx 68, New York. 

Braumann, Pedro Bruno Teodoro, Math.D. (Faculty of Sciences of Lisbon, Portugal) 
Assistente at the Faculty of Sciences of Lisbon, Portugal, Rua da Escola Politecnica, 
Lisbon, Portugal. 

Chew, Victor, B.Sc. (Univ. of Western Australia), Asst. Prof. Dept. of Agronomy, College 
of Agriculture, Univ. of Florida, Box 3568, University Station, Gainsville, Florida. 
Cohen, P. M., B.S. (Virginia Polytechnic Institute), Graduate Assistant, Statistics De- 

partment, Stanford University, Stanford, California. 

Cornell, R. G., M.S. (Va. Polytechnic Inst.), Graduate Fellow, Oak Ridge National Labora- 
tory, Institute of Nuclear Studies, P.O. Box P, Oak Ridge, Tennessee. 

Cramer, Elliot M., B.S. (Mass. Inst. of Tech.), Graduate Student, Department of Psy- 
chology, The Johns Hopkins University, Baltimore, Maryland. 

Derwort, Bernard John, Ph.D. (St. Louis University), Instructor, Mathematics Depart- 
ment, St. Louis University, St. Louis, Missouri. 

Hagatong, A. F. DeOliveira, Ec.D. (Lisbon Technical University), Statistician, Economist, 
Comissio Reguladora do Comércio de Bacalhau, do Ministério da Economia, Alcantara, 
Lisboa, Portugal, Rua de Mocambique, 21-1.°—Lisboa, Portugal. 

Hammersley, J. M., M.A. (Oxford), Lecturer, Trinity College, Oxford, England, Theoretical 
Physics D-vision, United Kingdom Atomic Energy Research Establishment, Harwell, 
Berkshire, England. 

Hatori, Tsukasa, B.S. (Tokyo Imperial Univ.), Asst. Prof., Dept. of Mathematics, Tokyo 
Literary University, 35 3-chome, Simouma Setagaya-ku, Tokyo, Japan, 266 Sakuragaoka 
Hodogaya-ku, Yokohama, Japan. 

Hromi, John D., M.Litt. (Math), (Univ. of Pittsburgh), Technologist, U. 8. Steel Corp. 
Applied Research Laboratory Monroeville, Pennsylvania. 

Klauber, Melville R., A.B. (Stanford Univ.), Research Assistant, Applied Mathematics and 





NEWS AND NOTICES 877 


Statistics Laboratory, Stanford University, Stanford, California, Bldg. 201, Apt. 3, 
Stanford Village, Stanford, Calif. 

Krumbein, William C., Ph.D. Geology (University of Chicago), Professor of Geology, 
Northwestern University, Evanston, Illinois. 

Kullback, Joseph Henry, B.A. (George Washington University), Graduate Student, Stan- 
ford University, Department of Statistics, Stanford, Calif. 

Kwo, T. T., D.Eng.Sc. (Columbia University), Research Engineer, General Electric Co., 
Appliance Park, Louisville 1, Kentucky. 

Ladouceur, J. C., M.Sc.(Math.) (University of Montreal), Professor, Department of Mathe- 
matics, College Militaire Royal de St.-Jean, St. Jean, P. Q., Canada. 

Leckie, D. S., M.S. (Case Inst. of Technology), Quality Control Engineer, Republic Steel 
Corp. 1527 Republic Bldg., Cleveland 1, Ohio, 5806 Clinton Avenue, Cleveland 2, Ohio. 

Lewis, Julian I., B.S. (Penna. State University), Asst. Adm., Pharmacology Dept., Hoff- 
mann-La Roche Inc., Kingsland Road, Nutley 10, New Jersey. 

Lindholm, Carroll R., MSEE (Illinois Inst. of Technology), Electronics Engineer, Motorola 
Research Laboratory, 8330 Indiana Avenue, Riverside, Calif., 5831 Willard Way, River- 
side, Calif. 

McLoughlin, G., M.A. (New York University), Mathematical Statistician, Office of The 
Chief of Transportation, Dept. of the Army, General Traffic Division, Bldg. T-7, 
Washington 25, D. C., 800 South Washington St., Bldg. A., Apt. 104, Alexandria, Virginia. 

Oliver, Robert M., B.Sc. (Mass. Inst. of Technology), student, M.I.T., Cambridge, 39, 
Mass., 2 Grey Gardens East, Cambridge 38, Mass. 

Patwary, Kamini M., MS., Statistics (Virginia Polytechnic Institute), Graduate Assist- 
ant, Statistics Department, American University, 1901 F St., N. W., Washington, 
D. C., 1781 21st St., N. W., Washington, D.C. 

Pruitt, William E., B.S., (Oklahoma A. and M. College), Research Assistant, Mathematics 
Department, Oklahoma A. and M. College, Stillwater, Oklahoma. 

Rao, Cayampudi Radhakrishna, Ph.D. (Cambridge), Head of Div. of Theoretical Research 

and Training, Indian Statistical Inst., 203 B. T. Road, Calcutta 35, India. 

Ravitch, Herman, B.S. (Univ. of Penna.), Statistician in meteorology (private in U. 8. 
Army, 52 341 509), 9470 TU Det 5, Fort Huachuca, Arizona. 

Rowan, W. H., B.S. (Wayne Univ.) Sr. Research Chemist, Metal and Thermite Corp., 1700 
East Nine Mile Road, Ferndale Station, Detroit 20, Michigan. 

Sligo, Joseph R., Ph.D. (State University of Iowa), Lecturer in Mathematics, University 
of Nevada, Reno, Nevada, 1791 W. 4th St., Reno, Nevada. 

Thomson, Barbara, B.A., (Univ. of Oregon), Graduate Assistant, Dept. of Statistics, 
Stanford University, Stanford, California. 

Thompson, James E., M.S. (Univ. of Wisconsin), Analyst, Dept. of Defense, Washington 
25, D. C., 812-20th St., N.W., Washington 6, D.C. 

Williams, Gregory P., A.B. (Columbia College), Statistical Analyst, General Electric Co., 
Evandale, Ohio, 2532 Highland Ave., Cincinnati 19, Ohio. 


INSTITUTIONAL MEMBERS 


Bell Telephone Laboratories Inc., Technical Library, 463 West St., New York 14, N. Y. 
Slepian, Dr. D., (Designated Representative for Bell Tel. Labs. Inc.), Bell Telephone 
Laboratories, Inc., Murray Hill, New Jersey. 


Pauls, John F., Statistical Laboratory (representative for lowa State College), Ames, Iowa. 


a 


Iowa State College to Offer Undergrad Scholarships in Statistics 


The Statistical Laboratory of Iowa State College has been given a grant of 
funds by the Westinghouse Electric Corporation to support four annual under- 





878 NEWS AND NOTICES 


graduate scholarships of $500 each. Known as the Westinghouse Scholarships in 
Statistics, they will be awarded to students who (1) will be enrolled as statistics 
majors (juniors or seniors) at ISC during the year for which scholarships are 
given and (2) will have completed a year of college calculus before that time. 
The awards will be made by a committee in the Statistical Laboratory on the 


basis of general scholastic achievement, ability and need for financial assistance. 
According to D. W. Gunther, Manager of the Materials Engineering Depart- 
ment, Westinghouse believes that such an undergraduate cooperative scholar- 
ship program would help attract students to statistics as a career and thus in- 
crease the supply of statisticians available to industry. 

Another recent move in the same direction was the establishment of a $1350 
graduate fellowship in industrial statistics at lowa State for the 1956-57 year. 
The nine-month fellowship, which carries no duties or obligations, is supported 
by a grant from the B. F. Goodrich Chemical Company and administered 
through the college Alumni Achievement Fund. 


ee 


PUBLICATIONS RECEIVED 


Curva Logistica de la Poblacion de Espaiia, Instituto Nacional de Estadistica, Ferraz, 41, 
Madrid, Spain, Serie A, Demografia—1, 1956, 31 pp. 

Jupson, Lewis V., Units and Systems of Weights and Measures, National Bureau of Stand 
ards Circular 570, April 1956, 29 pp., 12 halftone illustrations, 25 cents. (Order from 
the Government Printing Office, Washington 25, D. C.). 

Tables of the Cumulative Binomial Probabilities, Ordnance Corps., PB 111389, September 
1952, 575 pp., $6.00. (Order from Office of Technical Services, Department of Com 
merce, Washington 25, D. C., Attention: James FE. Wheat, Jr., Chief of Publications 
and Information 











TRABAJOS DE ESTADISTICA 


Review published by “Instituto de Investigaciones Estadisticas’ of the ‘‘Consejo 
Superior de Investagaciones Cientfficas.’’ Madrid, Spain. 


Vol. VII CONTENTS Cuad, I 


J. Romani Test no paramétricos en forma secuencial. 

E. Jaso, J. Besan, A. Angsio y 8. Rios Estudio de la evolucién y relaci6n de medidas antropométricas 
en los nifios mencres de un afio. 

Notas 

J. CastTaSer La programaci6n lineal y la Teorfa Econémica. Aplicaciones de la Programacién lineal. 

Crénicas Bibliografia. Cuestiones y Ejercicios 


Vol. VII CONTENTS Cuad. II 


J. Besar Regresién en mediana y la programacié6n lineal’ 
J. Romani Distribucién de la suma algebraica de variables de Powson’ 
J. TaALacko Perks’ distributions and their role in the theory of Wilner’s stochastics variables: 
B. M. Bennetr .. Note on the Poisson Index of dispersion- 


Notas. 


8S. Rios Métodos y Sree de la investigacién operativa. 


R. San Juan El método del “‘simplex”’ en la programacién lineal. 
G. ARNAIZ ; Inspecci6én de materiales fabricados. 
Cronica. Bibliografia. Cuestiones y Ejercicios. 


For everything in connection with works, exchanges and subscription write to Professor Sixto Rios, Instituto 
de Investigaciones Estadisticas of the Consejo Superior de Investigaciones Cientificas (Serrano, 123). Madrid, 
Spain. The Review is composed of three fascicles published three times a year (about 350 pages), and its annual 
price is 100 pesetas for Spain and South America and $4.00 U.S.A. for all other countries. 





ECONOMETRICA 


Journal of the Econometric Society 


Contents of Vol. 24, No. 3 - July, 1956 


Hans Brems The Foreign Trade Accelerator and International Transmission of Growth 
W. BeckeRMAN The World Trade Multiplier and the Stability of World Trade 1938 to 1953 
Rvupoir J. Frevnp The Introduction of Risk into a Programming Model 
DonaLtp Davipson AND Patrick Suppes A Finitistic Axiomatization of Subjective re and 
S tility 
D. van Dantzie Economic Decision Problems for Flood Prevention 
Avatn C. ENTHOVEN AND Kennets J. ARROW A Theorem on Expectations and the Stability of 
Equilibrium 
K. 8S. BANERJEE Note on the Optimum Allocation of the Number of Items in the Construction of a 
Cost of Living Index 
K. 8. BANERJEE Simplification of the Derivation of Wald’s Formula for the Cost of Living Index 
Report or tHe Kiet MEETING 
Report or THE New York MEETING 
Boox Reviews 


Published Quarterly Subscription rates available on request 


The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics 

Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in applying 
for membership should be addressed to Richard Ruggles, Secretary, The Econometric Society, Box 
1264, Yale University, New Haven, Connecticut. 





BIOMETRIKA 


Volume 43 Contents Parts 1 and 2, June 1956 


KENDALL, M.G. Studies in the history of probability and statistics. II. The beginnings of « probability 
calculus. BAILEY, N.T.J. On estimating the latent and infectious periods of measles. 1. Families with 
two susceptibles only. DARWIN,J.H. The behaviour of an estimator for a simple birth and death ga 
BROADBENT, Le - Examination of a quantum hypothesis based on a single set of data. GOOD, I. 
& TOULMIN, G. H. The number of new species, and the increase in population coverage, when a seals 
is increased. BARTHOLOMEW. D.J3. A sequential test of randomness for events occurring in time or 
space. ANIS, A. A. On the moments of the maximum of partial sums of a finite number of independent 
normal a DAVID, H. A. On the application to statistics of an elementary Soren | in eee: 
WISHART, J. xX? probabilities for large numbers of degrees of freedom. HALDANE, & MAY- 
NARD SMITH, Sheila. The sampling distribution of a maximum-likelihood estimate. BARTON, D. E. 
& DAVID, F .N. Teste for randomness of points on a line. BOSE, R.C. Paired comparison designs for 
concordance between judges. PILLAI, K.C. 8. On the distribution of the largest or the smallest 
=a a matrix in multivariate analysis. LAWLEY, D. N. Tests of significance for the latent roots of 
covariance and correlation matrices. WATSON, G. 8. On the joint distribution of the circular serial cor- 
relation coefficients. DANIELS, H. E. The approximate distribution of serial correlation coefficients. 
JENKINS, G.M. Tests of hypotheses in the linear autoregressive model. II. Null distributions for higher 
order schemes: non-central distributions. WILLIAMS, R. M. The variance of the mean of systematic 
samples. GUEST, P. G. Geonstns methods in = fitting of or to segue spaced observations. 
Miscellanea—Contributions by D. E. Barton, J.C soem. Harvey, M. J. R. Heary, G. 8. James, 
W. H. Trickett & B. L. Wetcn, C. L. MALLows, J. A. McF ADDEN, J. Cay 8. N. Roy & A.E SARHAN, 
J. Wiss, R. A. Woopnina 


Reviews Jorrigenda 


The subscription price, payable in advance, is 45s. inland, 54s. export (per volume including postage). Cheques 
should be drawn to Biometrika and sent to “The Secretary, Biometrika Office, Department of Statistics, 
University College. London, W.C. 1.” All foreign cheques must be in sterling and drawn on a bank 
having a London agency. 





MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liter- 
ature of the world, with full subject and author indices 


Publication of this journal is sponsored by the American Mathe- 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others 


Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $20.00 per year. 


Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
80 Waterman Street, Providence 6, Rhode Island 


JOURNAL OF THE 


STATISTICAL SOCIETY 


Series B (Methodological) 


Vol. XVII, No. 2, 1955 


Some Statistical Methods Connected with Series of Events....... D. R. Cox (With Discussion) 
Symposium on Linear Programming: 
is Outline of Linear Programming ? a VAaIDA 
On Minimizing a Convex Function Subject to Linear Inequalities , L. Beate 
A Contribution to the ‘“Travelling-Salesman” Problem.G. Morton anp A. H. Lanp (wit Discussion) 
Statistical Concepts in their Relation to Reality E. 8. Pearson 
The Comparison of Means of Sets of Ghaarvetiens from Sections of Independent Stochastic —., 


. 


. Jowsgrr 
A Note on the Periodogram of the Beveridge Wheat Price Index.................... me c. Gowesr 


Some Distribution and Moment Formulae for the Markov Chain P. Wurrrie 
Some Applications of Zero-one Processes .Z. A. LoMNicki Anp 8. e ZAREMBA 
Waiting Time in Bulk Service Queues . Downton 
A Note on Equalizing the Mean Waiting Times of Successive Customers in a Finite Queue — 


Norman T. J. Bartey 
On the Weighted Combination of Significance Tests I. J. Goop 


A Significance Test for the Difference in Efficiency between Two Predictors . : ..M. J. R. Heary 
A Unified Theory of Finite Sampling. .. ee ane ‘ .V. P. Gopamss 
The Royal Statistical Society, 21, Bentinck Street, London, oe | 

The Journal of the Royal Statistical Society is ublished i in two series: Series A (General), four issues a year, 
15s. each part, annual subscription £3.1s. post free; Series B (Methodological), two issues a year, 22s.6d. each 
part, annual! subscription 458.6d. post free. 


The Royal Statistical Society, 21, Bentinck Street, London, W. 1 





SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. 16, Parts 3 & 4, 1955 
Biometry .. “o J. B. S. HALDANE 
Is Biometry a Separate Discipline? . . .......-H. SPURWAY 


A Problem in the Sequential Design of Experiments. . RicHarp BELLMAN 


National Sample Survey: Number Seven: Couple Fertility 


ANNUAL SuBscrIPTiIon: 30 rupees ($10.00), 10 rupees ($3.50) per issue. 
Back Numpers: 45 rupees ($15.00) per volume; 12/8 rupees ($4.50) per issue 
Subscriptions and orders for back numbers should be sent to 


STATISTICAL PUBLISHING SOCIETY 
204/1 Barrackpore Trunk Road Calcutta 35, India 





Rie et 


TL, ). Savage, Commiltines on rs 
gs THinois , 


nia 


purpose of the Institute of 
ment, ee ‘and a 


. ANNALS oF MATHE. 


of the United States 





re hy 


les Ate RE ie E 


On Minimum Variance ne LAneat Pons of Order Statisties. 
K. 


es, Re At) ca 
A Note on“‘Some F ‘urthes: Results i tp Simultaneous Cadence Interval bstjane 
On OOR: N. Ror. 3 a 
A- Note on the Normal Distribution, “Seymour Gerssmn’!/ es 
Correetion to ‘An Applieation of Information Theery iL 7S Kw Lenk, Ay 3 
Abstracts * Pape vs, News and Notices, Fanbestions os 


2 tn he 


es 
# 


'S OF THE INSTYFUTE, 
‘ATIVE SCBEDULE 


ANNUAL MEETING 4thonti Cay ‘Neve aie 


A. re: OSeptember, 1957 


of. 


Abst¥acts should be. 


abstract blanke, which 


Those received by. April wo will mal An pe fcr Arnal, by July 31 
“4a Devember, ete: ‘Abstraate showd oe to 200 worda or sepa 


lent, and should avoid at presions Pet peeplicnied formulue 
“ean be accepted frome 





